GCC assumes that stack is aligned to 16-byte on call sites [1].
Since GCC 8, GCC began using 16-byte aligned SSE instructions to
implement assignments to structs on stack. When
CC_OPTIMIZE_FOR_PERFORMANCE is enabled, this affects
os-Linux/sigio.c, write_sigio_thread:
struct pollfds *fds, tmp;
tmp = current_poll;
Note that struct pollfds is exactly 16 bytes in size.
GCC 8+ generates assembly similar to:
movdqa (%rdi),%xmm0
movaps %xmm0,-0x50(%rbp)
This is an issue, because movaps will #GP if -0x50(%rbp) is not
aligned to 16 bytes [2], and how rbp gets assigned to is via glibc
clone thread_start, then function prologue, going though execution
trace similar to (showing only relevant instructions):
sub $0x10,%rsi
mov %rcx,0x8(%rsi)
mov %rdi,(%rsi)
syscall
pop %rax
pop %rdi
callq *%rax
push %rbp
mov %rsp,%rbp
The stack pointer always points to the topmost element on stack,
rather then the space right above the topmost. On push, the
pointer decrements first before writing to the memory pointed to
by it. Therefore, there is no need to have the stack pointer
pointer always point to valid memory unless the stack is poped;
so the `- sizeof(void *)` in the code is unnecessary.
On the other hand, glibc reserves the 16 bytes it needs on stack
and pops itself, so by the call instruction the stack pointer
is exactly the caller-supplied sp. It then push the 16 bytes of
the return address and the saved stack pointer, so the base
pointer will be 16-byte aligned if and only if the caller
supplied sp is 16-byte aligned. Therefore, the caller must supply
a 16-byte aligned pointer, which `stack + UM_KERN_PAGE_SIZE`
already satisfies.
On a side note, musl is unaffected by this issue because it forces
16 byte alignment via `and $-16,%rsi` in its clone wrapper.
Similarly, glibc i386 is also unaffected because it has
`andl $0xfffffff0, %ecx`.
To reproduce this bug, enable CONFIG_UML_RTC and
CC_OPTIMIZE_FOR_PERFORMANCE. uml_rtc will call
add_sigio_fd which will then cause write_sigio_thread to either go
into segfault loop or panic with "Segfault with no mm".
Similarly, signal stacks will be aligned by the host kernel upon
signal delivery. `- sizeof(void *)` to sigaltstack is
unconventional and extraneous.
On a related note, initialization of longjmp buffers do require
`- sizeof(void *)`. This is to account for the return address
that would have been pushed to the stack at the call site.
The reason for uml to respect 16-byte alignment, rather than
telling GCC to assume 8-byte alignment like the host kernel since
commit d9b0cde91c ("x86-64, gcc: Use
-mpreferred-stack-boundary=3 if supported"), is because uml links
against libc. There is no reason to assume libc is also compiled
with that flag and assumes 8-byte alignment rather than 16-byte.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[2] https://c9x.me/x86/html/file_module_x86_id_180.html
Signed-off-by: YiFei Zhu <zhuyifei1999@gmail.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Richard Weinberger <richard@nod.at>
This mostly reverts the old commit 3963333fe6 ("uml: cover stubs
with a VMA") which had added a VMA to the existing PTEs. However,
there's no real reason to have the PTEs in the first place and the
VMA cannot be 'fixed' in place, which leads to bugs that userspace
could try to unmap them and be forcefully killed, or such. Also,
there's a bit of an ugly hole in userspace's address space.
Simplify all this: just install the stub code/page at the top of
the (inner) address space, i.e. put it just above TASK_SIZE. The
pages are simply hard-coded to be mapped in the userspace process
we use to implement an mm context, and they're out of reach of the
inner mmap/munmap/mprotect etc. since they're above TASK_SIZE.
Getting rid of the VMA also makes vma_merge() no longer hit one of
the VM_WARN_ON()s there because we installed a VMA while the code
assumes the stack VMA is the first one.
It also removes a lockdep warning about mmap_sem usage since we no
longer have uml_setup_stubs() and thus no longer need to do any
manipulation that would require mmap_sem in activate_mm().
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
The userspace stacks mostly have a stack (and in the case of the
syscall stub we can just set their stack pointer) that points to
the location of the stub data page already.
Rework the stubs to use the stack pointer to derive the start of
the data page, rather than requiring it to be hard-coded.
In the clone stub, also integrate the int3 into the stack remap,
since we really must not use the stack while we remap it.
This prepares for putting the stub at a variable location that's
not part of the normal address space of the userspace processes
running inside the UML machine.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
If the two are mixed up, then it looks as though the parent
returned an error if the child failed (before) the mmap(),
and then the resulting process never gets killed. Fix this
by splitting the child and parent errors, reporting and
using them appropriately.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
In some cases we can get to fix_range_common() with mmap_sem held,
and in others we get there without it being held. For example, we
get there with it held from sys_mprotect(), and without it held
from fork_handler().
Avoid any issues in this and simply defer killing the task until
it runs the next time. Do it on the mm so that another task that
shares the same mm can't continue running afterwards.
Cc: stable@vger.kernel.org
Fixes: 468f65976a ("um: Fix hung task in fix_range_common()")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Since we're basically debugging the userspace (it runs in ptrace)
it's useful to dump out the registers - but they're not readable,
so if something goes wrong it's hard to say what. Print the names
of registers in the register dump so it's easier to look at.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
UML userspace fetches siginfo and passes it to signal handlers
in UML. This is needed only for some of the signals, because
key handlers like SIGIO make no use of this variable.
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Convert files to use SPDX header. All files are licensed under the GPLv2.
Signed-off-by: Alex Dewar <alex.dewar@gmx.co.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
UML enables TRACE_IRQFLAGS_SUPPORT but doesn't actually implement
it. It seems to have been added for lockdep support, but that can't
actually really work well without IRQ flags tracing, as is also
very noisily reported when enabling CONFIG_DEBUG_LOCKDEP.
Implement it now.
Fixes: 711553efa5 ("[PATCH] uml: declare in Kconfig our partial LOCKDEP support")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Fixes:
arch/um/os-Linux/skas/process.c:613:1: warning: control reaches end of
non-void function [-Wreturn-type]
longjmp() never returns but gcc still warns that the end of the function
can be reached.
Add a return code and debug aid to detect this impossible case.
Signed-off-by: Richard Weinberger <richard@nod.at>
Also use correct function name spelling (stub_segv_handler) for better grepping
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
When ptrace fails to set GP/FP regs for the target process,
log the error before crashing the UML kernel.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
Define NR_CPUS required by the timer subsystem.
Fixes this make warning:
scripts/kconfig/conf --oldconfig arch/x86/um/Kconfig
kernel/time/Kconfig:155:warning: range is invalid
Signed-off-by: Nikola Kotur <kotnick@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This fix two related bugs:
* PTRACE_GETREGS doesn't get the right orig_ax (syscall) value
* PTRACE_SETREGS can't set the orig_ax value (erased by initial value)
Get rid of the now useless and error-prone get_syscall().
Fix inconsistent behavior in the ptrace implementation for i386 when
updating orig_eax automatically update the syscall number as well. This
is now updated in handle_syscall().
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Thomas Meyer <thomas@m3y3r.de>
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Anton Ivanov <aivanov@brocade.com>
Cc: Meredydd Luff <meredydd@senatehouse.org>
Cc: David Drysdale <drysdale@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Kees Cook <keescook@chromium.org>
UML is using an obsolete itimer call for
all timers and "polls" for kernel space timer firing
in its userspace portion resulting in a long list
of bugs and incorrect behaviour(s). It also uses
ITIMER_VIRTUAL for its timer which results in the
timer being dependent on it running and the cpu
load.
This patch fixes this by moving to posix high resolution
timers firing off CLOCK_MONOTONIC and relaying the timer
correctly to the UML userspace.
Fixes:
- crashes when hosts suspends/resumes
- broken userspace timers - effecive ~40Hz instead
of what they should be. Note - this modifies skas behavior
by no longer setting an itimer per clone(). Timer events
are relayed instead.
- kernel network packet scheduling disciplines
- tcp behaviour especially under load
- various timer related corner cases
Finally, overall responsiveness of userspace is better.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Anton Ivanov <aivanov@brocade.com>
[rw: massaged commit message]
Signed-off-by: Richard Weinberger <richard@nod.at>
When declaring __syscall_stub_start, use the same type in UML userspace
code as in arch/um/include/asm/sections.h.
While at it, also declare batch_syscall_stub as char[].
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
atomic_notifier_chain_register() and uml_postsetup() do call kernel code
that rely on the "current" kernel macro and a valid task_struct resp.
thread_info struct. Give those functions a valid stack by moving
uml_postsetup() in the init_thread stack. This moves enables a panic()
call in this early code to generate a valid stacktrace, instead of
crashing.
E.g. when an UML kernel is started with an initrd but too few physical
memory the panic() call get's actually processed.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
Before we had SKAS0 UML had two modes of operation
TT (tracing thread) and SKAS3/4 (separated kernel address space).
TT was known to be insecure and got removed a long time ago.
SKAS3/4 required a few (3 or 4) patches on the host side which never went
mainline. The last host patch is 10 years old.
With SKAS0 mode (separated kernel address space using 0 host patches),
default since 2005, SKAS3/4 is obsolete and can be removed.
Signed-off-by: Richard Weinberger <richard@nod.at>
This reverts commit 0974a9cadc.
The real for for that issue is to release current->mm->mmap_sem in
fix_range_common().
Signed-off-by: Richard Weinberger <richard@nod.at>
Currently we use both struct siginfo and siginfo_t.
Let's use struct siginfo internally to avoid ongoing
compiler warning. We are allowed to do so because
struct siginfo and siginfo_t are equivalent.
Signed-off-by: Richard Weinberger <richard@nod.at>
If we die within a stub handler we only way to reliable
kill the (obviously) dying uml guest process is killing
it's host twin on the host side.
Signed-off-by: Richard Weinberger <richard@nod.at>
UML guest processes now get correct siginfo_t for SIGTRAP, SIGFPE,
SIGILL and SIGBUS. Specifically, si_addr and si_code are now correct
where previously they were si_addr = NULL and si_code = 128.
Signed-off-by: Martin Pärtel <martin.partel@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
For one thing, we always block the same signals (IRQ ones - IO, WINCH, VTALRM),
so there's no need to pass sa_mask elements in arguments. For another, the
flags depend only on whether it's an IRQ signal or not (we add SA_RESTART
for them).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
We used to generate those, but we hadn't done that for a long
time. No need to bother blocking them for signal handlers.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
Some time ago Jeff prepared 42daba3165 ("uml: stop saving process FP
state") for UML to stop saving the process FP state between task
switches. The assumption was that since with SKAS0 every guest process
runs inside a host process context the host OS will take care of keeping
the proper FP state.
Unfortunately this is not true for multi-threaded applications, where
all guest threads share a single host process context yet all may use
the FPU on their own. Although I haven't verified it I suspect things
to be even worse in SKAS3 mode where all guest processes run inside a
single host process.
The patch reintroduces the saving and restoring of the FP context
between task switches.
[richard@nod.at: Ingo posted this patch in 2009, sadly it was never applied
and got lost. Now in 2011 the problem was reported by Gunnar.]
Signed-off-by: Ingo van Lil <inguin@gmx.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
Reported-by: <gunnarlindroth@hotmail.com>
Tested-by: <gunnarlindroth@hotmail.com>
Cc: Stanislav Meduna <stano@meduna.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Make some variables and functions static, since they don't need to be
global.
- Remove an unused function - arch/um/kernel/time.c::sched_clock().
- Clean the style a bit as complained by checkpatch.pl.
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: WANG Cong <wangcong@zeuux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We lost the marking of SIGWINCH as being OK to receive during stub
execution, causing a panic should that happen.
Cc: Benedict Verheyen <benedict.verheyen@gmail.com>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A few random style fixes.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit ee3d9bd4de ("uml: simplify SIGSEGV
handling"), while greatly simplifying the kernel SIGSEGV handler that
runs in the process address space, introduced a bug which corrupts FP
state in the process.
Previously, the SIGSEGV handler called the sigreturn system call by hand - it
couldn't return through the restorer provided to it because that could try to
call the libc restorer which likely wouldn't exist in the process address
space. So, it blocked off some signals, including SIGUSR1, on entry to the
SIGSEGV handler, queued a SIGUSR1 to itself, and invoked sigreturn. The
SIGUSR1 was delivered, and was visible to the UML kernel after sigreturn
finished.
The commit eliminated the signal masking and the call to sigreturn. The
handler simply hits itself with a SIGTRAP to let the UML kernel know that it
is finished. UML then restores the process registers, which effectively
longjmps the process out of the signal handler, skipping sigreturn's restoring
of register state and the signal mask.
The bug is that the host apparently sets used_fp to 0 when it saves the
process FP state in the sigcontext on the process signal stack. Thus, when
the process is longjmped out of the handler, its FP state is corrupt because
it wasn't saved on the context switch to the UML kernel.
This manifested itself as sleep hanging. For some reason, sleep uses floating
point in order to calculate the sleep interval. When a page fault corrupts
its FP state, it is faked into essentially sleeping forever.
This patch saves the FP state before entering the SIGSEGV handler and restores
it afterwards.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Style changes under arch/um/os-Linux:
include trimming
CodingStyle fixes
some printks needed severity indicators
make_tempfile turns out not to be used outside of mem.c, so it is now static.
Its declaration in tempfile.h is no longer needed, and tempfile.h itself is no
longer needed.
create_tmp_file was also made static.
checkpatch moans about an EXPORT_SYMBOL in user_syms.c which is part of a
macro definition - this is copying a bit of kernel infrastructure into the
libc side of UML because the kernel headers can't be included there.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some printks were missing newlines.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch moves sig_handler_common_skas from
arch/um/os-Linux/skas/trap.c to its only caller in
arch/um/os-Linux/signal.c. trap.c is now empty, so it can be removed.
This is code movement only - the significant cleanup needed here is
done in the next patch.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kill a process that tries to branch into a stub and execute a system
call. There are no security implications here - a system call in a
stub is treated the same as a system call anywhere else. But if a
process is trying to branch into a stub, either it is trying something
nasty or it has gone haywire, so it's a good idea to get rid of it in
either case.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
UML was panicing in the case of failures of libc calls which shouldn't happen.
This is an overreaction since a failure from libc doesn't normally mean that
kernel data structures are in an unknown state. Instead, the current process
should just be killed if there is no way to recover.
The case that prompted this was a failure of PTRACE_SETREGS restoring the same
state that was read by PTRACE_GETREGS. It appears that when a process tries
to load a bogus value into a segment register, it segfaults (as expected) and
the value is actually loaded and is seen by PTRACE_GETREGS (not expected).
This case is fixed by forcing a fatal SIGSEGV on the process so that it
immediately dies. fatal_sigsegv was added for this purpose. It was declared
as noreturn, so in order to pursuade gcc that it actually does not return, I
added a call to os_dump_core (and declared it noreturn) so that I get a core
file if somehow the process survives.
All other calls in arch/um/os-Linux/skas/process.c got the same treatment,
with failures causing the process to die instead of a kernel panic, with some
exceptions.
userspace_tramp exits with status 1 if anything goes wrong there. That will
cause start_userspace to return an error. copy_context_skas0 and
map_stub_pages also now return errors instead of panicing. Callers of thes
functions were changed to check for errors and do something appropriate.
Usually that's to return an error to their callers.
check_skas3_ptrace_faultinfo just exits since that's too early to do anything
else.
save_registers, restore_registers, and init_registers now return status
instead of panicing on failure, with their callers doing something
appropriate.
There were also duplicate declarations of save_registers and restore_registers
in os.h - these are gone.
I noticed and fixed up some whitespace damage.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some register accessor cleanups -
userspace() was calling restore_registers and save_registers for no
reason, since userspace() is on the libc side of the house, and these
add no value over calling ptrace directly
init_thread_registers and get_safe_registers were the same thing,
so init_thread_registers is gone
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Simplify the page fault stub by not masking signals while it is running. This
allows it to signal that it is done by executing an instruction which will
generate a SIGTRAP (int3 on x86) rather than running sigreturn by hand after
queueing a blocked SIGUSR1.
userspace_tramp now no longer puts anything in the SIGSEGV sa_mask, but it
does add SA_NODEFER to sa_flags so that SIGSEGV is still enabled after the
signal handler fails to run sigreturn.
SIGWINCH is just blocked so that we don't have to deal with it and the signal
masks used by wait_stub_done are updated to reflect the smaller number of
signals that it has to worry about.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>