Signed-off-by: Eric Saint Etienne <eric.saint.etienne@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In general, there's no need for the "restore sigmask" flag to live in
ti->flags. alpha, ia64, microblaze, powerpc, sh, sparc (64-bit only),
tile, and x86 use essentially identical alternative implementations,
placing the flag in ti->status.
Replace those optimized implementations with an equally good common
implementation that stores it in a bitfield in struct task_struct and
drop the custom implementations.
Additional architectures can opt in by removing their
TIF_RESTORE_SIGMASK defines.
Link: http://lkml.kernel.org/r/8a14321d64a28e40adfddc90e18a96c086a6d6f9.1468522723.git.luto@kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As execution domain support is gone we can remove
signal translation from the signal code and remove
exec_domain from thread_info.
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: David S. Miller <davem@davemloft.net>
If an attacker can cause a controlled kernel stack overflow, overwriting
the restart block is a very juicy exploit target. This is because the
restart_block is held in the same memory allocation as the kernel stack.
Moving the restart block to struct task_struct prevents this exploit by
making the restart_block harder to locate.
Note that there are other fields in thread_info that are also easy
targets, at least on some architectures.
It's also a decent simplification, since the restart code is more or less
identical on all architectures.
[james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Richard Weinberger <richard@nod.at>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull audit updates from Eric Paris:
"So this change across a whole bunch of arches really solves one basic
problem. We want to audit when seccomp is killing a process. seccomp
hooks in before the audit syscall entry code. audit_syscall_entry
took as an argument the arch of the given syscall. Since the arch is
part of what makes a syscall number meaningful it's an important part
of the record, but it isn't available when seccomp shoots the
syscall...
For most arch's we have a better way to get the arch (syscall_get_arch)
So the solution was two fold: Implement syscall_get_arch() everywhere
there is audit which didn't have it. Use syscall_get_arch() in the
seccomp audit code. Having syscall_get_arch() everywhere meant it was
a useless flag on the stack and we could get rid of it for the typical
syscall entry.
The other changes inside the audit system aren't grand, fixed some
records that had invalid spaces. Better locking around the task comm
field. Removing some dead functions and structs. Make some things
static. Really minor stuff"
* git://git.infradead.org/users/eparis/audit: (31 commits)
audit: rename audit_log_remove_rule to disambiguate for trees
audit: cull redundancy in audit_rule_change
audit: WARN if audit_rule_change called illegally
audit: put rule existence check in canonical order
next: openrisc: Fix build
audit: get comm using lock to avoid race in string printing
audit: remove open_arg() function that is never used
audit: correct AUDIT_GET_FEATURE return message type
audit: set nlmsg_len for multicast messages.
audit: use union for audit_field values since they are mutually exclusive
audit: invalid op= values for rules
audit: use atomic_t to simplify audit_serial()
kernel/audit.c: use ARRAY_SIZE instead of sizeof/sizeof[0]
audit: reduce scope of audit_log_fcaps
audit: reduce scope of audit_net_id
audit: arm64: Remove the audit arch argument to audit_syscall_entry
arm64: audit: Add audit hook in syscall_trace_enter/exit()
audit: x86: drop arch from __audit_syscall_entry() interface
sparc: implement is_32bit_task
sparc: properly conditionalize use of TIF_32BIT
...
This breaks the stack end corruption detection facility.
What that facility does it write a magic value to "end_of_stack()"
and checking to see if it gets overwritten.
"end_of_stack()" is "task_thread_info(p) + 1", which for sparc64 is
the beginning of the FPU register save area.
So once the user uses the FPU, the magic value is overwritten and the
debug checks trigger.
Fix this by making the size explicit.
Due to the size we use for the fpsaved[], gsr[], and xfsr[] arrays we
are limited to 7 levels of FPU state saves. So each FPU register set
is 256 bytes, allocate 256 * 7 for the fpregs area.
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are currently embedding the same check from thread_info.h into
syscall.h thanks to the way syscall_get_arch() was implemented in the
audit tree. Instead create a new function, is_32bit_task() which is
similar to that found on the powerpc arch. This simplifies the
syscall.h code and makes the build/Kconfig requirements much easier
to understand.
Signed-off-by: Eric Paris <eparis@redhat.com
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: sparclinux@vger.kernel.org
We've witnessed a few TLB events causing the machine to power off because
of prom_halt. In one case it was some nfs related area during rmmod. Another
was an mmapper of /dev/mem. A more recent one is an ITLB issue with
a bad pagesize which could be a hardware bug. Bugs happen but we should
attempt to not power off the machine and/or hang it when possible.
This is a DTLB error from an mmapper of /dev/mem:
[root@sparcie ~]# SUN4V-DTLB: Error at TPC[fffff80100903e6c], tl 1
SUN4V-DTLB: TPC<0xfffff80100903e6c>
SUN4V-DTLB: O7[fffff801081979d0]
SUN4V-DTLB: O7<0xfffff801081979d0>
SUN4V-DTLB: vaddr[fffff80100000000] ctx[1250] pte[98000000000f0610] error[2]
.
This is recent mainline for ITLB:
[ 3708.179864] SUN4V-ITLB: TPC<0xfffffc010071cefc>
[ 3708.188866] SUN4V-ITLB: O7[fffffc010071cee8]
[ 3708.197377] SUN4V-ITLB: O7<0xfffffc010071cee8>
[ 3708.206539] SUN4V-ITLB: vaddr[e0003] ctx[1a3c] pte[2900000dcc800eeb] error[4]
.
Normally sun4v_itlb_error_report() and sun4v_dtlb_error_report() would call
prom_halt() and drop us to OF command prompt "ok". This isn't the case for
LDOMs and the machine powers off.
For the HV reported error of HV_ENORADDR for HV HV_MMU_MAP_ADDR_TRAP we cause
a SIGBUS error by qualifying it within do_sparc64_fault() for fault code mask
of FAULT_CODE_BAD_RA. This is done when trap level (%tl) is less or equal
one("1"). Otherwise, for %tl > 1, we proceed eventually to die_if_kernel().
The logic of this patch was partially inspired by David Miller's feedback.
Power off of large sparc64 machines is painful. Plus die_if_kernel provides
more context. A reset sequence isn't a brief period on large sparc64 but
better than power-off/power-on sequence.
Cc: sparclinux@vger.kernel.org
Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull irq cleanups from Ingo Molnar:
"This is a multi-arch cleanup series from Thomas Gleixner, which we
kept to near the end of the merge window, to not interfere with
architecture updates.
This series (motivated by the -rt kernel) unifies more aspects of IRQ
handling and generalizes PREEMPT_ACTIVE"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
preempt: Make PREEMPT_ACTIVE generic
sparc: Use preempt_schedule_irq
ia64: Use preempt_schedule_irq
m32r: Use preempt_schedule_irq
hardirq: Make hardirq bits generic
m68k: Simplify low level interrupt handling code
genirq: Prevent spurious detection for unconditionally polled interrupts
Mark the places when the system are in user or are in kernel.
This is used to make full dynticks system (tickless) --
CONFIG_NO_HZ_FULL dependence.
Signed-off-by: Kirill Tkhai <tkhai@yandex.ru>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move it to a common place. Preparatory patch for implementing
set/clear for the idle need_resched poll implementation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130321215233.446034505@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Backmerge from the point in mainline where a trivial conflict had been
introduced (arch/sparc/kernel/sys_sparc_64.c had grown sys_kern_features()
right after where kernel_execve() used to be)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The Montgomery Multiply, Montgomery Square, and Multiple-Precision
Multiply instructions work by loading a combination of the floating
point and multiple register windows worth of integer registers
with the inputs.
These values are 64-bit. But for 32-bit userland processes we only
save the low 32-bits of each integer register during a register spill.
This is because the register window save area is in the user stack and
has a fixed layout.
Therefore, the only way to use these instruction in 32-bit mode is to
perform the following sequence:
1) Load the top-32bits of a choosen integer register with a sentinel,
say "-1". This will be in the outer-most register window.
The idea is that we're trying to see if the outer-most register
window gets spilled, and thus the 64-bit values were truncated.
2) Load all the inputs for the montmul/montsqr/mpmul instruction,
down to the inner-most register window.
3) Execute the opcode.
4) Traverse back up to the outer-most register window.
5) Check the sentinel, if it's still "-1" store the results.
Otherwise retry the entire sequence.
This retry is extremely troublesome. If you're just unlucky and an
interrupt or other trap happens, it'll push that outer-most window to
the stack and clear the sentinel when we restore it.
We could retry forever and never make forward progress if interrupts
arrive at a fast enough rate (consider perf events as one example).
So we have do limited retries and fallback to software which is
extremely non-deterministic.
Luckily it's very straightforward to provide a mechanism to let
32-bit applications use a 64-bit stack. Stacks in 64-bit mode are
biased by 2047 bytes, which means that the lowest bit is set in the
actual %sp register value.
So if we see bit zero set in a 32-bit application's stack we treat
it like a 64-bit stack.
Runtime detection of such a facility is tricky, and cumbersome at
best. For example, just trying to use a biased stack and seeing if it
works is hard to recover from (the signal handler will need to use an
alt stack, plus something along the lines of longjmp). Therefore, we
add a system call to report a bitmask of arch specific features like
this in a cheap and less hairy way.
With help from Andy Polyakov.
Signed-off-by: David S. Miller <davem@davemloft.net>
Move that sucker to just before TI_FPDEPTH and replace stb with sth in
etrap_save(). Take current_ds to its old place, so that we don't push
wsaved into TI_... flags. That allows to lose clearing syscall_noerror
on return from syscall.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Make default just return 0. The current default (checking
TIF_POLLING_NRFLAG) is taken to architectures that need it;
ones that don't do polling in their idle threads don't need
to defined TIF_POLLING_NRFLAG at all.
ia64 defined both TS_POLLING (used by its tsk_is_polling())
and TIF_POLLING_NRFLAG (not used at all). Killed the latter...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add a node parameter to alloc_thread_info(), and change its name to
alloc_thread_info_node()
This change is needed to allow NUMA aware kthread_create_on_cpu()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
People should be using the perf events interfaces, and
the way these system call facilities used the %pcr conflicts
with the usage of the NMI watchdog and perf events.
Signed-off-by: David S. Miller <davem@davemloft.net>
Here are the sparc bits to remove TIF_ABI_PENDING now that
set_personality() is called at the appropriate place in exec.
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull the initial preempt_count value into a single
definition site.
Maintainers for: alpha, ia64 and m68k, please have a look,
your arch code is funny.
The header magic is a bit odd, but similar to the KERNEL_DS
one, CPP waits with expanding these macros until the
INIT_THREAD_INFO macro itself is expanded, which is in
arch/*/kernel/init_task.c where we've already included
sched.h so we're good.
Cc: tony.luck@intel.com
Cc: rth@twiddle.net
Cc: geert@linux-m68k.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: build fix
Today's linux-next build (sparc64 defconfig) failed like this:
arch/sparc/kernel/built-in.o: In function `trap_init':
(.init.text+0x4): undefined reference to `thread_info_offsets_are_bolixed_dave'
Caused by commit 52400ba946 ("futex: add
requeue_pi functionality") (from the tip-core tree) which changed the
size of struct restart_block.
Shift TI_KUNA_REGS and TI_KUNA_INSN up by 8 bytes to make space for the
larger restart block.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: "David S. Miller" <davem@davemloft.net>
Cc: Darren Hart <dvhltc@us.ibm.com>
LKML-Reference: <20090409151722.c8eabb56.sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch series introduces a cgroup subsystem that utilizes the swsusp
freezer to freeze a group of tasks. It's immediately useful for batch job
management scripts. It should also be useful in the future for
implementing container checkpoint/restart.
The freezer subsystem in the container filesystem defines a cgroup file
named freezer.state. Reading freezer.state will return the current state
of the cgroup. Writing "FROZEN" to the state file will freeze all tasks
in the cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in
the cgroup.
* Examples of usage :
# mkdir /containers/freezer
# mount -t cgroup -ofreezer freezer /containers
# mkdir /containers/0
# echo $some_pid > /containers/0/tasks
to get status of the freezer subsystem :
# cat /containers/0/freezer.state
RUNNING
to freeze all tasks in the container :
# echo FROZEN > /containers/0/freezer.state
# cat /containers/0/freezer.state
FREEZING
# cat /containers/0/freezer.state
FROZEN
to unfreeze all tasks in the container :
# echo RUNNING > /containers/0/freezer.state
# cat /containers/0/freezer.state
RUNNING
This patch:
The first step in making the refrigerator() available to all
architectures, even for those without power management.
The purpose of such a change is to be able to use the refrigerator() in a
new control group subsystem which will implement a control group freezer.
[akpm@linux-foundation.org: fix sparc]
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Acked-by: Pavel Machek <pavel@suse.cz>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@tuxonice.net>
Tested-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds TIF_NOTIFY_RESUME support for sparc64.
When set, we call tracehook_notify_resume() on the way to user mode.
Signed-off-by: Roland McGrath <roland@redhat.com>
The majority of this patch was created by the following script:
***
ASM=arch/sparc/include/asm
mkdir -p $ASM
git mv include/asm-sparc64/ftrace.h $ASM
git rm include/asm-sparc64/*
git mv include/asm-sparc/* $ASM
sed -ie 's/asm-sparc64/asm/g' $ASM/*
sed -ie 's/asm-sparc/asm/g' $ASM/*
***
The rest was an update of the top-level Makefile to use sparc
for header files when sparc64 is being build.
And a small fixlet to pick up the correct unistd.h from
sparc64 code.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>