2019-06-03 13:44:50 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0-only
|
2012-03-05 19:49:28 +08:00
|
|
|
/*
|
|
|
|
* Based on arch/arm/kernel/process.c
|
|
|
|
*
|
|
|
|
* Original Copyright (C) 1995 Linus Torvalds
|
|
|
|
* Copyright (C) 1996-2000 Russell King - Converted to ARM.
|
|
|
|
* Copyright (C) 2012 ARM Ltd.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <stdarg.h>
|
|
|
|
|
2014-04-30 17:51:32 +08:00
|
|
|
#include <linux/compat.h>
|
2015-03-06 22:49:24 +08:00
|
|
|
#include <linux/efi.h>
|
2020-03-17 00:50:47 +08:00
|
|
|
#include <linux/elf.h>
|
2012-03-05 19:49:28 +08:00
|
|
|
#include <linux/export.h>
|
|
|
|
#include <linux/sched.h>
|
2017-02-09 01:51:35 +08:00
|
|
|
#include <linux/sched/debug.h>
|
2017-02-09 01:51:36 +08:00
|
|
|
#include <linux/sched/task.h>
|
2017-02-09 01:51:37 +08:00
|
|
|
#include <linux/sched/task_stack.h>
|
2012-03-05 19:49:28 +08:00
|
|
|
#include <linux/kernel.h>
|
arm64: entry.S: Do not preempt from IRQ before all cpufeatures are enabled
Preempting from IRQ-return means that the task has its PSTATE saved
on the stack, which will get restored when the task is resumed and does
the actual IRQ return.
However, enabling some CPU features requires modifying the PSTATE. This
means that, if a task was scheduled out during an IRQ-return before all
CPU features are enabled, the task might restore a PSTATE that does not
include the feature enablement changes once scheduled back in.
* Task 1:
PAN == 0 ---| |---------------
| |<- return from IRQ, PSTATE.PAN = 0
| <- IRQ |
+--------+ <- preempt() +--
^
|
reschedule Task 1, PSTATE.PAN == 1
* Init:
--------------------+------------------------
^
|
enable_cpu_features
set PSTATE.PAN on all CPUs
Worse than this, since PSTATE is untouched when task switching is done,
a task missing the new bits in PSTATE might affect another task, if both
do direct calls to schedule() (outside of IRQ/exception contexts).
Fix this by preventing preemption on IRQ-return until features are
enabled on all CPUs.
This way the only PSTATE values that are saved on the stack are from
synchronous exceptions. These are expected to be fatal this early, the
exception is BRK for WARN_ON(), but as this uses do_debug_exception()
which keeps IRQs masked, it shouldn't call schedule().
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
[james: Replaced a really cool hack, with an even simpler static key in C.
expanded commit message with Julien's cover-letter ascii art]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2019-10-16 01:25:44 +08:00
|
|
|
#include <linux/lockdep.h>
|
2020-03-17 00:50:47 +08:00
|
|
|
#include <linux/mman.h>
|
2012-03-05 19:49:28 +08:00
|
|
|
#include <linux/mm.h>
|
|
|
|
#include <linux/stddef.h>
|
2019-07-24 01:58:39 +08:00
|
|
|
#include <linux/sysctl.h>
|
2012-03-05 19:49:28 +08:00
|
|
|
#include <linux/unistd.h>
|
|
|
|
#include <linux/user.h>
|
|
|
|
#include <linux/delay.h>
|
|
|
|
#include <linux/reboot.h>
|
|
|
|
#include <linux/interrupt.h>
|
|
|
|
#include <linux/init.h>
|
|
|
|
#include <linux/cpu.h>
|
|
|
|
#include <linux/elfcore.h>
|
|
|
|
#include <linux/pm.h>
|
|
|
|
#include <linux/tick.h>
|
|
|
|
#include <linux/utsname.h>
|
|
|
|
#include <linux/uaccess.h>
|
|
|
|
#include <linux/random.h>
|
|
|
|
#include <linux/hw_breakpoint.h>
|
|
|
|
#include <linux/personality.h>
|
|
|
|
#include <linux/notifier.h>
|
2015-09-16 22:23:21 +08:00
|
|
|
#include <trace/events/power.h>
|
arm64: split thread_info from task stack
This patch moves arm64's struct thread_info from the task stack into
task_struct. This protects thread_info from corruption in the case of
stack overflows, and makes its address harder to determine if stack
addresses are leaked, making a number of attacks more difficult. Precise
detection and handling of overflow is left for subsequent patches.
Largely, this involves changing code to store the task_struct in sp_el0,
and acquire the thread_info from the task struct. Core code now
implements current_thread_info(), and as noted in <linux/sched.h> this
relies on offsetof(task_struct, thread_info) == 0, enforced by core
code.
This change means that the 'tsk' register used in entry.S now points to
a task_struct, rather than a thread_info as it used to. To make this
clear, the TI_* field offsets are renamed to TSK_TI_*, with asm-offsets
appropriately updated to account for the structural change.
Userspace clobbers sp_el0, and we can no longer restore this from the
stack. Instead, the current task is cached in a per-cpu variable that we
can safely access from early assembly as interrupts are disabled (and we
are thus not preemptible).
Both secondary entry and idle are updated to stash the sp and task
pointer separately.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: James Morse <james.morse@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-11-04 04:23:13 +08:00
|
|
|
#include <linux/percpu.h>
|
arm64/sve: Core task context handling
This patch adds the core support for switching and managing the SVE
architectural state of user tasks.
Calls to the existing FPSIMD low-level save/restore functions are
factored out as new functions task_fpsimd_{save,load}(), since SVE
now dynamically may or may not need to be handled at these points
depending on the kernel configuration, hardware features discovered
at boot, and the runtime state of the task. To make these
decisions as fast as possible, const cpucaps are used where
feasible, via the system_supports_sve() helper.
The SVE registers are only tracked for threads that have explicitly
used SVE, indicated by the new thread flag TIF_SVE. Otherwise, the
FPSIMD view of the architectural state is stored in
thread.fpsimd_state as usual.
When in use, the SVE registers are not stored directly in
thread_struct due to their potentially large and variable size.
Because the task_struct slab allocator must be configured very
early during kernel boot, it is also tricky to configure it
correctly to match the maximum vector length provided by the
hardware, since this depends on examining secondary CPUs as well as
the primary. Instead, a pointer sve_state in thread_struct points
to a dynamically allocated buffer containing the SVE register data,
and code is added to allocate and free this buffer at appropriate
times.
TIF_SVE is set when taking an SVE access trap from userspace, if
suitable hardware support has been detected. This enables SVE for
the thread: a subsequent return to userspace will disable the trap
accordingly. If such a trap is taken without sufficient system-
wide hardware support, SIGILL is sent to the thread instead as if
an undefined instruction had been executed: this may happen if
userspace tries to use SVE in a system where not all CPUs support
it for example.
The kernel will clear TIF_SVE and disable SVE for the thread
whenever an explicit syscall is made by userspace. For backwards
compatibility reasons and conformance with the spirit of the base
AArch64 procedure call standard, the subset of the SVE register
state that aliases the FPSIMD registers is still preserved across a
syscall even if this happens. The remainder of the SVE register
state logically becomes zero at syscall entry, though the actual
zeroing work is currently deferred until the thread next tries to
use SVE, causing another trap to the kernel. This implementation
is suboptimal: in the future, the fastpath case may be optimised
to zero the registers in-place and leave SVE enabled for the task,
where beneficial.
TIF_SVE is also cleared in the following slowpath cases, which are
taken as reasonable hints that the task may no longer use SVE:
* exec
* fork and clone
Code is added to sync data between thread.fpsimd_state and
thread.sve_state whenever enabling/disabling SVE, in a manner
consistent with the SVE architectural programmer's model.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Alex Bennée <alex.bennee@linaro.org>
[will: added #include to fix allnoconfig build]
[will: use enable_daif in do_sve_acc]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-10-31 23:51:05 +08:00
|
|
|
#include <linux/thread_info.h>
|
2019-07-24 01:58:39 +08:00
|
|
|
#include <linux/prctl.h>
|
2012-03-05 19:49:28 +08:00
|
|
|
|
2016-02-05 22:58:48 +08:00
|
|
|
#include <asm/alternative.h>
|
2019-01-31 22:58:47 +08:00
|
|
|
#include <asm/arch_gicv3.h>
|
2012-03-05 19:49:28 +08:00
|
|
|
#include <asm/compat.h>
|
arm64: entry.S: Do not preempt from IRQ before all cpufeatures are enabled
Preempting from IRQ-return means that the task has its PSTATE saved
on the stack, which will get restored when the task is resumed and does
the actual IRQ return.
However, enabling some CPU features requires modifying the PSTATE. This
means that, if a task was scheduled out during an IRQ-return before all
CPU features are enabled, the task might restore a PSTATE that does not
include the feature enablement changes once scheduled back in.
* Task 1:
PAN == 0 ---| |---------------
| |<- return from IRQ, PSTATE.PAN = 0
| <- IRQ |
+--------+ <- preempt() +--
^
|
reschedule Task 1, PSTATE.PAN == 1
* Init:
--------------------+------------------------
^
|
enable_cpu_features
set PSTATE.PAN on all CPUs
Worse than this, since PSTATE is untouched when task switching is done,
a task missing the new bits in PSTATE might affect another task, if both
do direct calls to schedule() (outside of IRQ/exception contexts).
Fix this by preventing preemption on IRQ-return until features are
enabled on all CPUs.
This way the only PSTATE values that are saved on the stack are from
synchronous exceptions. These are expected to be fatal this early, the
exception is BRK for WARN_ON(), but as this uses do_debug_exception()
which keeps IRQs masked, it shouldn't call schedule().
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
[james: Replaced a really cool hack, with an even simpler static key in C.
expanded commit message with Julien's cover-letter ascii art]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2019-10-16 01:25:44 +08:00
|
|
|
#include <asm/cpufeature.h>
|
2012-03-05 19:49:28 +08:00
|
|
|
#include <asm/cacheflush.h>
|
2016-10-18 18:27:48 +08:00
|
|
|
#include <asm/exec.h>
|
2013-01-17 20:31:45 +08:00
|
|
|
#include <asm/fpsimd.h>
|
|
|
|
#include <asm/mmu_context.h>
|
2012-03-05 19:49:28 +08:00
|
|
|
#include <asm/processor.h>
|
arm64: add basic pointer authentication support
This patch adds basic support for pointer authentication, allowing
userspace to make use of APIAKey, APIBKey, APDAKey, APDBKey, and
APGAKey. The kernel maintains key values for each process (shared by all
threads within), which are initialised to random values at exec() time.
The ID_AA64ISAR1_EL1.{APA,API,GPA,GPI} fields are exposed to userspace,
to describe that pointer authentication instructions are available and
that the kernel is managing the keys. Two new hwcaps are added for the
same reason: PACA (for address authentication) and PACG (for generic
authentication).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Tested-by: Adam Wallis <awallis@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
[will: Fix sizeof() usage and unroll address key initialisation]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-08 02:39:25 +08:00
|
|
|
#include <asm/pointer_auth.h>
|
2012-03-05 19:49:28 +08:00
|
|
|
#include <asm/stacktrace.h>
|
|
|
|
|
2018-12-12 20:08:44 +08:00
|
|
|
#if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK)
|
2014-06-26 06:55:03 +08:00
|
|
|
#include <linux/stackprotector.h>
|
|
|
|
unsigned long __stack_chk_guard __read_mostly;
|
|
|
|
EXPORT_SYMBOL(__stack_chk_guard);
|
|
|
|
#endif
|
|
|
|
|
2012-03-05 19:49:28 +08:00
|
|
|
/*
|
|
|
|
* Function pointers to optional machine specific functions
|
|
|
|
*/
|
|
|
|
void (*pm_power_off)(void);
|
|
|
|
EXPORT_SYMBOL_GPL(pm_power_off);
|
|
|
|
|
2013-07-23 18:05:10 +08:00
|
|
|
void (*arm_pm_restart)(enum reboot_mode reboot_mode, const char *cmd);
|
2012-03-05 19:49:28 +08:00
|
|
|
|
2019-01-31 22:58:47 +08:00
|
|
|
static void __cpu_do_idle(void)
|
|
|
|
{
|
|
|
|
dsb(sy);
|
|
|
|
wfi();
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __cpu_do_idle_irqprio(void)
|
|
|
|
{
|
|
|
|
unsigned long pmr;
|
|
|
|
unsigned long daif_bits;
|
|
|
|
|
|
|
|
daif_bits = read_sysreg(daif);
|
|
|
|
write_sysreg(daif_bits | PSR_I_BIT, daif);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Unmask PMR before going idle to make sure interrupts can
|
|
|
|
* be raised.
|
|
|
|
*/
|
|
|
|
pmr = gic_read_pmr();
|
2019-06-11 17:38:10 +08:00
|
|
|
gic_write_pmr(GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET);
|
2019-01-31 22:58:47 +08:00
|
|
|
|
|
|
|
__cpu_do_idle();
|
|
|
|
|
|
|
|
gic_write_pmr(pmr);
|
|
|
|
write_sysreg(daif_bits, daif);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* cpu_do_idle()
|
|
|
|
*
|
|
|
|
* Idle the processor (wait for interrupt).
|
|
|
|
*
|
|
|
|
* If the CPU supports priority masking we must do additional work to
|
|
|
|
* ensure that interrupts are not masked at the PMR (because the core will
|
|
|
|
* not wake up if we block the wake up signal in the interrupt controller).
|
|
|
|
*/
|
|
|
|
void cpu_do_idle(void)
|
|
|
|
{
|
|
|
|
if (system_uses_irq_prio_masking())
|
|
|
|
__cpu_do_idle_irqprio();
|
|
|
|
else
|
|
|
|
__cpu_do_idle();
|
|
|
|
}
|
|
|
|
|
2012-03-05 19:49:28 +08:00
|
|
|
/*
|
|
|
|
* This is our default idle handler.
|
|
|
|
*/
|
2013-03-22 05:49:39 +08:00
|
|
|
void arch_cpu_idle(void)
|
2012-03-05 19:49:28 +08:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* This should do all the clock switching and wait for interrupt
|
|
|
|
* tricks
|
|
|
|
*/
|
2014-02-17 23:59:30 +08:00
|
|
|
cpu_do_idle();
|
|
|
|
local_irq_enable();
|
2012-03-05 19:49:28 +08:00
|
|
|
}
|
|
|
|
|
2013-10-25 03:30:18 +08:00
|
|
|
#ifdef CONFIG_HOTPLUG_CPU
|
|
|
|
void arch_cpu_idle_dead(void)
|
|
|
|
{
|
|
|
|
cpu_die();
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
arm64: Fix machine_shutdown() definition
This patch ports most of commit 19ab428f4b79 "ARM: 7759/1: decouple CPU
offlining from reboot/shutdown" by Stephen Warren from arch/arm to
arch/arm64.
machine_shutdown() is a hook for kexec. Add a comment saying so, since
it isn't obvious from the function name.
Halt, power-off, and restart have different requirements re: stopping
secondary CPUs than kexec has. The former simply require the secondary
CPUs to be quiesced somehow, whereas kexec requires them to be
completely non-operational, so that no matter where the kexec target
images are written in RAM, they won't influence operation of the
secondary CPUS,which could happen if the CPUs were still executing some
kind of pin loop. To this end, modify machine_halt, power_off, and
restart to call smp_send_stop() directly, rather than calling
machine_shutdown().
In machine_shutdown(), replace the call to smp_send_stop() with a call
to disable_nonboot_cpus(). This completely disables all but one CPU,
thus satisfying the kexec requirements a couple paragraphs above.
Signed-off-by: Arun KS <getarunks@gmail.com>
Acked-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-07 09:41:22 +08:00
|
|
|
/*
|
|
|
|
* Called by kexec, immediately prior to machine_kexec().
|
|
|
|
*
|
|
|
|
* This must completely disable all secondary CPUs; simply causing those CPUs
|
|
|
|
* to execute e.g. a RAM-based pin loop is not sufficient. This allows the
|
|
|
|
* kexec'd kernel to use any and all RAM as it sees fit, without having to
|
|
|
|
* avoid any code or data used by any SW CPU pin loop. The CPU hotplug
|
2020-03-23 21:50:59 +08:00
|
|
|
* functionality embodied in smpt_shutdown_nonboot_cpus() to achieve this.
|
arm64: Fix machine_shutdown() definition
This patch ports most of commit 19ab428f4b79 "ARM: 7759/1: decouple CPU
offlining from reboot/shutdown" by Stephen Warren from arch/arm to
arch/arm64.
machine_shutdown() is a hook for kexec. Add a comment saying so, since
it isn't obvious from the function name.
Halt, power-off, and restart have different requirements re: stopping
secondary CPUs than kexec has. The former simply require the secondary
CPUs to be quiesced somehow, whereas kexec requires them to be
completely non-operational, so that no matter where the kexec target
images are written in RAM, they won't influence operation of the
secondary CPUS,which could happen if the CPUs were still executing some
kind of pin loop. To this end, modify machine_halt, power_off, and
restart to call smp_send_stop() directly, rather than calling
machine_shutdown().
In machine_shutdown(), replace the call to smp_send_stop() with a call
to disable_nonboot_cpus(). This completely disables all but one CPU,
thus satisfying the kexec requirements a couple paragraphs above.
Signed-off-by: Arun KS <getarunks@gmail.com>
Acked-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-07 09:41:22 +08:00
|
|
|
*/
|
2012-03-05 19:49:28 +08:00
|
|
|
void machine_shutdown(void)
|
|
|
|
{
|
2020-03-23 21:51:00 +08:00
|
|
|
smp_shutdown_nonboot_cpus(reboot_cpu);
|
2012-03-05 19:49:28 +08:00
|
|
|
}
|
|
|
|
|
arm64: Fix machine_shutdown() definition
This patch ports most of commit 19ab428f4b79 "ARM: 7759/1: decouple CPU
offlining from reboot/shutdown" by Stephen Warren from arch/arm to
arch/arm64.
machine_shutdown() is a hook for kexec. Add a comment saying so, since
it isn't obvious from the function name.
Halt, power-off, and restart have different requirements re: stopping
secondary CPUs than kexec has. The former simply require the secondary
CPUs to be quiesced somehow, whereas kexec requires them to be
completely non-operational, so that no matter where the kexec target
images are written in RAM, they won't influence operation of the
secondary CPUS,which could happen if the CPUs were still executing some
kind of pin loop. To this end, modify machine_halt, power_off, and
restart to call smp_send_stop() directly, rather than calling
machine_shutdown().
In machine_shutdown(), replace the call to smp_send_stop() with a call
to disable_nonboot_cpus(). This completely disables all but one CPU,
thus satisfying the kexec requirements a couple paragraphs above.
Signed-off-by: Arun KS <getarunks@gmail.com>
Acked-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-07 09:41:22 +08:00
|
|
|
/*
|
|
|
|
* Halting simply requires that the secondary CPUs stop performing any
|
|
|
|
* activity (executing tasks, handling interrupts). smp_send_stop()
|
|
|
|
* achieves this.
|
|
|
|
*/
|
2012-03-05 19:49:28 +08:00
|
|
|
void machine_halt(void)
|
|
|
|
{
|
2014-05-07 09:41:23 +08:00
|
|
|
local_irq_disable();
|
arm64: Fix machine_shutdown() definition
This patch ports most of commit 19ab428f4b79 "ARM: 7759/1: decouple CPU
offlining from reboot/shutdown" by Stephen Warren from arch/arm to
arch/arm64.
machine_shutdown() is a hook for kexec. Add a comment saying so, since
it isn't obvious from the function name.
Halt, power-off, and restart have different requirements re: stopping
secondary CPUs than kexec has. The former simply require the secondary
CPUs to be quiesced somehow, whereas kexec requires them to be
completely non-operational, so that no matter where the kexec target
images are written in RAM, they won't influence operation of the
secondary CPUS,which could happen if the CPUs were still executing some
kind of pin loop. To this end, modify machine_halt, power_off, and
restart to call smp_send_stop() directly, rather than calling
machine_shutdown().
In machine_shutdown(), replace the call to smp_send_stop() with a call
to disable_nonboot_cpus(). This completely disables all but one CPU,
thus satisfying the kexec requirements a couple paragraphs above.
Signed-off-by: Arun KS <getarunks@gmail.com>
Acked-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-07 09:41:22 +08:00
|
|
|
smp_send_stop();
|
2012-03-05 19:49:28 +08:00
|
|
|
while (1);
|
|
|
|
}
|
|
|
|
|
arm64: Fix machine_shutdown() definition
This patch ports most of commit 19ab428f4b79 "ARM: 7759/1: decouple CPU
offlining from reboot/shutdown" by Stephen Warren from arch/arm to
arch/arm64.
machine_shutdown() is a hook for kexec. Add a comment saying so, since
it isn't obvious from the function name.
Halt, power-off, and restart have different requirements re: stopping
secondary CPUs than kexec has. The former simply require the secondary
CPUs to be quiesced somehow, whereas kexec requires them to be
completely non-operational, so that no matter where the kexec target
images are written in RAM, they won't influence operation of the
secondary CPUS,which could happen if the CPUs were still executing some
kind of pin loop. To this end, modify machine_halt, power_off, and
restart to call smp_send_stop() directly, rather than calling
machine_shutdown().
In machine_shutdown(), replace the call to smp_send_stop() with a call
to disable_nonboot_cpus(). This completely disables all but one CPU,
thus satisfying the kexec requirements a couple paragraphs above.
Signed-off-by: Arun KS <getarunks@gmail.com>
Acked-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-07 09:41:22 +08:00
|
|
|
/*
|
|
|
|
* Power-off simply requires that the secondary CPUs stop performing any
|
|
|
|
* activity (executing tasks, handling interrupts). smp_send_stop()
|
|
|
|
* achieves this. When the system power is turned off, it will take all CPUs
|
|
|
|
* with it.
|
|
|
|
*/
|
2012-03-05 19:49:28 +08:00
|
|
|
void machine_power_off(void)
|
|
|
|
{
|
2014-05-07 09:41:23 +08:00
|
|
|
local_irq_disable();
|
arm64: Fix machine_shutdown() definition
This patch ports most of commit 19ab428f4b79 "ARM: 7759/1: decouple CPU
offlining from reboot/shutdown" by Stephen Warren from arch/arm to
arch/arm64.
machine_shutdown() is a hook for kexec. Add a comment saying so, since
it isn't obvious from the function name.
Halt, power-off, and restart have different requirements re: stopping
secondary CPUs than kexec has. The former simply require the secondary
CPUs to be quiesced somehow, whereas kexec requires them to be
completely non-operational, so that no matter where the kexec target
images are written in RAM, they won't influence operation of the
secondary CPUS,which could happen if the CPUs were still executing some
kind of pin loop. To this end, modify machine_halt, power_off, and
restart to call smp_send_stop() directly, rather than calling
machine_shutdown().
In machine_shutdown(), replace the call to smp_send_stop() with a call
to disable_nonboot_cpus(). This completely disables all but one CPU,
thus satisfying the kexec requirements a couple paragraphs above.
Signed-off-by: Arun KS <getarunks@gmail.com>
Acked-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-07 09:41:22 +08:00
|
|
|
smp_send_stop();
|
2012-03-05 19:49:28 +08:00
|
|
|
if (pm_power_off)
|
|
|
|
pm_power_off();
|
|
|
|
}
|
|
|
|
|
arm64: Fix machine_shutdown() definition
This patch ports most of commit 19ab428f4b79 "ARM: 7759/1: decouple CPU
offlining from reboot/shutdown" by Stephen Warren from arch/arm to
arch/arm64.
machine_shutdown() is a hook for kexec. Add a comment saying so, since
it isn't obvious from the function name.
Halt, power-off, and restart have different requirements re: stopping
secondary CPUs than kexec has. The former simply require the secondary
CPUs to be quiesced somehow, whereas kexec requires them to be
completely non-operational, so that no matter where the kexec target
images are written in RAM, they won't influence operation of the
secondary CPUS,which could happen if the CPUs were still executing some
kind of pin loop. To this end, modify machine_halt, power_off, and
restart to call smp_send_stop() directly, rather than calling
machine_shutdown().
In machine_shutdown(), replace the call to smp_send_stop() with a call
to disable_nonboot_cpus(). This completely disables all but one CPU,
thus satisfying the kexec requirements a couple paragraphs above.
Signed-off-by: Arun KS <getarunks@gmail.com>
Acked-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-07 09:41:22 +08:00
|
|
|
/*
|
|
|
|
* Restart requires that the secondary CPUs stop performing any activity
|
2015-04-20 17:24:35 +08:00
|
|
|
* while the primary CPU resets the system. Systems with multiple CPUs must
|
arm64: Fix machine_shutdown() definition
This patch ports most of commit 19ab428f4b79 "ARM: 7759/1: decouple CPU
offlining from reboot/shutdown" by Stephen Warren from arch/arm to
arch/arm64.
machine_shutdown() is a hook for kexec. Add a comment saying so, since
it isn't obvious from the function name.
Halt, power-off, and restart have different requirements re: stopping
secondary CPUs than kexec has. The former simply require the secondary
CPUs to be quiesced somehow, whereas kexec requires them to be
completely non-operational, so that no matter where the kexec target
images are written in RAM, they won't influence operation of the
secondary CPUS,which could happen if the CPUs were still executing some
kind of pin loop. To this end, modify machine_halt, power_off, and
restart to call smp_send_stop() directly, rather than calling
machine_shutdown().
In machine_shutdown(), replace the call to smp_send_stop() with a call
to disable_nonboot_cpus(). This completely disables all but one CPU,
thus satisfying the kexec requirements a couple paragraphs above.
Signed-off-by: Arun KS <getarunks@gmail.com>
Acked-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-07 09:41:22 +08:00
|
|
|
* provide a HW restart implementation, to ensure that all CPUs reset at once.
|
|
|
|
* This is required so that any code running after reset on the primary CPU
|
|
|
|
* doesn't have to co-ordinate with other CPUs to ensure they aren't still
|
|
|
|
* executing pre-reset code, and using RAM that the primary CPU's code wishes
|
|
|
|
* to use. Implementing such co-ordination would be essentially impossible.
|
|
|
|
*/
|
2012-03-05 19:49:28 +08:00
|
|
|
void machine_restart(char *cmd)
|
|
|
|
{
|
|
|
|
/* Disable interrupts first */
|
|
|
|
local_irq_disable();
|
2014-05-07 09:41:23 +08:00
|
|
|
smp_send_stop();
|
2012-03-05 19:49:28 +08:00
|
|
|
|
2015-03-06 22:49:24 +08:00
|
|
|
/*
|
|
|
|
* UpdateCapsule() depends on the system being reset via
|
|
|
|
* ResetSystem().
|
|
|
|
*/
|
|
|
|
if (efi_enabled(EFI_RUNTIME_SERVICES))
|
|
|
|
efi_reboot(reboot_mode, NULL);
|
|
|
|
|
2012-03-05 19:49:28 +08:00
|
|
|
/* Now call the architecture specific reboot code. */
|
2013-03-01 02:14:37 +08:00
|
|
|
if (arm_pm_restart)
|
2013-07-11 19:13:00 +08:00
|
|
|
arm_pm_restart(reboot_mode, cmd);
|
2014-09-26 08:03:16 +08:00
|
|
|
else
|
|
|
|
do_kernel_restart(cmd);
|
2012-03-05 19:49:28 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Whoops - the architecture was unable to reboot.
|
|
|
|
*/
|
|
|
|
printk("Reboot failed -- System halted\n");
|
|
|
|
while (1);
|
|
|
|
}
|
|
|
|
|
arm64: BTI: Decode BYTPE bits when printing PSTATE
The current code to print PSTATE symbolically when generating
backtraces etc., does not include the BYTPE field used by Branch
Target Identification.
So, decode BYTPE and print it too.
In the interests of human-readability, print the classes of BTI
matched. The symbolic notation, BYTPE (PSTATE[11:10]) and
permitted classes of subsequent instruction are:
-- (BTYPE=0b00): any insn
jc (BTYPE=0b01): BTI jc, BTI j, BTI c, PACIxSP
-c (BYTPE=0b10): BTI jc, BTI c, PACIxSP
j- (BTYPE=0b11): BTI jc, BTI j
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-17 00:50:48 +08:00
|
|
|
#define bstr(suffix, str) [PSR_BTYPE_ ## suffix >> PSR_BTYPE_SHIFT] = str
|
|
|
|
static const char *const btypes[] = {
|
|
|
|
bstr(NONE, "--"),
|
|
|
|
bstr( JC, "jc"),
|
|
|
|
bstr( C, "-c"),
|
|
|
|
bstr( J , "j-")
|
|
|
|
};
|
|
|
|
#undef bstr
|
|
|
|
|
2017-10-19 20:26:26 +08:00
|
|
|
static void print_pstate(struct pt_regs *regs)
|
|
|
|
{
|
|
|
|
u64 pstate = regs->pstate;
|
|
|
|
|
|
|
|
if (compat_user_mode(regs)) {
|
|
|
|
printk("pstate: %08llx (%c%c%c%c %c %s %s %c%c%c)\n",
|
|
|
|
pstate,
|
2018-07-05 22:16:52 +08:00
|
|
|
pstate & PSR_AA32_N_BIT ? 'N' : 'n',
|
|
|
|
pstate & PSR_AA32_Z_BIT ? 'Z' : 'z',
|
|
|
|
pstate & PSR_AA32_C_BIT ? 'C' : 'c',
|
|
|
|
pstate & PSR_AA32_V_BIT ? 'V' : 'v',
|
|
|
|
pstate & PSR_AA32_Q_BIT ? 'Q' : 'q',
|
|
|
|
pstate & PSR_AA32_T_BIT ? "T32" : "A32",
|
|
|
|
pstate & PSR_AA32_E_BIT ? "BE" : "LE",
|
|
|
|
pstate & PSR_AA32_A_BIT ? 'A' : 'a',
|
|
|
|
pstate & PSR_AA32_I_BIT ? 'I' : 'i',
|
|
|
|
pstate & PSR_AA32_F_BIT ? 'F' : 'f');
|
2017-10-19 20:26:26 +08:00
|
|
|
} else {
|
arm64: BTI: Decode BYTPE bits when printing PSTATE
The current code to print PSTATE symbolically when generating
backtraces etc., does not include the BYTPE field used by Branch
Target Identification.
So, decode BYTPE and print it too.
In the interests of human-readability, print the classes of BTI
matched. The symbolic notation, BYTPE (PSTATE[11:10]) and
permitted classes of subsequent instruction are:
-- (BTYPE=0b00): any insn
jc (BTYPE=0b01): BTI jc, BTI j, BTI c, PACIxSP
-c (BYTPE=0b10): BTI jc, BTI c, PACIxSP
j- (BTYPE=0b11): BTI jc, BTI j
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-17 00:50:48 +08:00
|
|
|
const char *btype_str = btypes[(pstate & PSR_BTYPE_MASK) >>
|
|
|
|
PSR_BTYPE_SHIFT];
|
|
|
|
|
|
|
|
printk("pstate: %08llx (%c%c%c%c %c%c%c%c %cPAN %cUAO BTYPE=%s)\n",
|
2017-10-19 20:26:26 +08:00
|
|
|
pstate,
|
|
|
|
pstate & PSR_N_BIT ? 'N' : 'n',
|
|
|
|
pstate & PSR_Z_BIT ? 'Z' : 'z',
|
|
|
|
pstate & PSR_C_BIT ? 'C' : 'c',
|
|
|
|
pstate & PSR_V_BIT ? 'V' : 'v',
|
|
|
|
pstate & PSR_D_BIT ? 'D' : 'd',
|
|
|
|
pstate & PSR_A_BIT ? 'A' : 'a',
|
|
|
|
pstate & PSR_I_BIT ? 'I' : 'i',
|
|
|
|
pstate & PSR_F_BIT ? 'F' : 'f',
|
|
|
|
pstate & PSR_PAN_BIT ? '+' : '-',
|
arm64: BTI: Decode BYTPE bits when printing PSTATE
The current code to print PSTATE symbolically when generating
backtraces etc., does not include the BYTPE field used by Branch
Target Identification.
So, decode BYTPE and print it too.
In the interests of human-readability, print the classes of BTI
matched. The symbolic notation, BYTPE (PSTATE[11:10]) and
permitted classes of subsequent instruction are:
-- (BTYPE=0b00): any insn
jc (BTYPE=0b01): BTI jc, BTI j, BTI c, PACIxSP
-c (BYTPE=0b10): BTI jc, BTI c, PACIxSP
j- (BTYPE=0b11): BTI jc, BTI j
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-17 00:50:48 +08:00
|
|
|
pstate & PSR_UAO_BIT ? '+' : '-',
|
|
|
|
btype_str);
|
2017-10-19 20:26:26 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-03-05 19:49:28 +08:00
|
|
|
void __show_regs(struct pt_regs *regs)
|
|
|
|
{
|
2013-09-18 01:49:46 +08:00
|
|
|
int i, top_reg;
|
|
|
|
u64 lr, sp;
|
|
|
|
|
|
|
|
if (compat_user_mode(regs)) {
|
|
|
|
lr = regs->compat_lr;
|
|
|
|
sp = regs->compat_sp;
|
|
|
|
top_reg = 12;
|
|
|
|
} else {
|
|
|
|
lr = regs->regs[30];
|
|
|
|
sp = regs->sp;
|
|
|
|
top_reg = 29;
|
|
|
|
}
|
2012-03-05 19:49:28 +08:00
|
|
|
|
dump_stack: unify debug information printed by show_regs()
show_regs() is inherently arch-dependent but it does make sense to print
generic debug information and some archs already do albeit in slightly
different forms. This patch introduces a generic function to print debug
information from show_regs() so that different archs print out the same
information and it's much easier to modify what's printed.
show_regs_print_info() prints out the same debug info as dump_stack()
does plus task and thread_info pointers.
* Archs which didn't print debug info now do.
alpha, arc, blackfin, c6x, cris, frv, h8300, hexagon, ia64, m32r,
metag, microblaze, mn10300, openrisc, parisc, score, sh64, sparc,
um, xtensa
* Already prints debug info. Replaced with show_regs_print_info().
The printed information is superset of what used to be there.
arm, arm64, avr32, mips, powerpc, sh32, tile, unicore32, x86
* s390 is special in that it used to print arch-specific information
along with generic debug info. Heiko and Martin think that the
arch-specific extra isn't worth keeping s390 specfic implementation.
Converted to use the generic version.
Note that now all archs print the debug info before actual register
dumps.
An example BUG() dump follows.
kernel BUG at /work/os/work/kernel/workqueue.c:4841!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #7
Hardware name: empty empty/S3992, BIOS 080011 10/26/2007
task: ffff88007c85e040 ti: ffff88007c860000 task.ti: ffff88007c860000
RIP: 0010:[<ffffffff8234a07e>] [<ffffffff8234a07e>] init_workqueues+0x4/0x6
RSP: 0000:ffff88007c861ec8 EFLAGS: 00010246
RAX: ffff88007c861fd8 RBX: ffffffff824466a8 RCX: 0000000000000001
RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffffffff8234a07a
RBP: ffff88007c861ec8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8234a07a
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff88015f7ff000 CR3: 00000000021f1000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff88007c861ef8 ffffffff81000312 ffffffff824466a8 ffff88007c85e650
0000000000000003 0000000000000000 ffff88007c861f38 ffffffff82335e5d
ffff88007c862080 ffffffff8223d8c0 ffff88007c862080 ffffffff81c47760
Call Trace:
[<ffffffff81000312>] do_one_initcall+0x122/0x170
[<ffffffff82335e5d>] kernel_init_freeable+0x9b/0x1c8
[<ffffffff81c47760>] ? rest_init+0x140/0x140
[<ffffffff81c4776e>] kernel_init+0xe/0xf0
[<ffffffff81c6be9c>] ret_from_fork+0x7c/0xb0
[<ffffffff81c47760>] ? rest_init+0x140/0x140
...
v2: Typo fix in x86-32.
v3: CPU number dropped from show_regs_print_info() as
dump_stack_print_info() has been updated to print it. s390
specific implementation dropped as requested by s390 maintainers.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com> [tile bits]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-01 06:27:17 +08:00
|
|
|
show_regs_print_info(KERN_DEFAULT);
|
2017-10-19 20:26:26 +08:00
|
|
|
print_pstate(regs);
|
2018-02-20 00:46:57 +08:00
|
|
|
|
|
|
|
if (!user_mode(regs)) {
|
|
|
|
printk("pc : %pS\n", (void *)regs->pc);
|
2020-03-13 17:05:00 +08:00
|
|
|
printk("lr : %pS\n", (void *)ptrauth_strip_insn_pac(lr));
|
2018-02-20 00:46:57 +08:00
|
|
|
} else {
|
|
|
|
printk("pc : %016llx\n", regs->pc);
|
|
|
|
printk("lr : %016llx\n", lr);
|
|
|
|
}
|
|
|
|
|
2017-10-19 20:26:26 +08:00
|
|
|
printk("sp : %016llx\n", sp);
|
arm64: fix show_regs fallout from KERN_CONT changes
Recently in commit 4bcc595ccd80decb ("printk: reinstate KERN_CONT for
printing continuation lines"), the behaviour of printk changed w.r.t.
KERN_CONT. Now, KERN_CONT is mandatory to continue existing lines.
Without this, prefixes are inserted, making output illegible, e.g.
[ 1007.069010] pc : [<ffff00000871898c>] lr : [<ffff000008718948>] pstate: 40000145
[ 1007.076329] sp : ffff000008d53ec0
[ 1007.079606] x29: ffff000008d53ec0 [ 1007.082797] x28: 0000000080c50018
[ 1007.086160]
[ 1007.087630] x27: ffff000008e0c7f8 [ 1007.090820] x26: ffff80097631ca00
[ 1007.094183]
[ 1007.095653] x25: 0000000000000001 [ 1007.098843] x24: 000000ea68b61cac
[ 1007.102206]
... or when dumped with the userpace dmesg tool, which has slightly
different implicit newline behaviour. e.g.
[ 1007.069010] pc : [<ffff00000871898c>] lr : [<ffff000008718948>] pstate: 40000145
[ 1007.076329] sp : ffff000008d53ec0
[ 1007.079606] x29: ffff000008d53ec0
[ 1007.082797] x28: 0000000080c50018
[ 1007.086160]
[ 1007.087630] x27: ffff000008e0c7f8
[ 1007.090820] x26: ffff80097631ca00
[ 1007.094183]
[ 1007.095653] x25: 0000000000000001
[ 1007.098843] x24: 000000ea68b61cac
[ 1007.102206]
We can't simply always use KERN_CONT for lines which may or may not be
continuations. That causes line prefixes (e.g. timestamps) to be
supressed, and the alignment of all but the first line will be broken.
For even more fun, we can't simply insert some dummy empty-string printk
calls, as GCC warns for an empty printk string, and even if we pass
KERN_DEFAULT explcitly to silence the warning, the prefix gets swallowed
unless there is an additional part to the string.
Instead, we must manually iterate over pairs of registers, which gives
us the legible output we want in either case, e.g.
[ 169.771790] pc : [<ffff00000871898c>] lr : [<ffff000008718948>] pstate: 40000145
[ 169.779109] sp : ffff000008d53ec0
[ 169.782386] x29: ffff000008d53ec0 x28: 0000000080c50018
[ 169.787650] x27: ffff000008e0c7f8 x26: ffff80097631de00
[ 169.792913] x25: 0000000000000001 x24: 00000027827b2cf4
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-10-20 19:23:16 +08:00
|
|
|
|
2019-01-31 22:58:46 +08:00
|
|
|
if (system_uses_irq_prio_masking())
|
|
|
|
printk("pmr_save: %08llx\n", regs->pmr_save);
|
|
|
|
|
arm64: fix show_regs fallout from KERN_CONT changes
Recently in commit 4bcc595ccd80decb ("printk: reinstate KERN_CONT for
printing continuation lines"), the behaviour of printk changed w.r.t.
KERN_CONT. Now, KERN_CONT is mandatory to continue existing lines.
Without this, prefixes are inserted, making output illegible, e.g.
[ 1007.069010] pc : [<ffff00000871898c>] lr : [<ffff000008718948>] pstate: 40000145
[ 1007.076329] sp : ffff000008d53ec0
[ 1007.079606] x29: ffff000008d53ec0 [ 1007.082797] x28: 0000000080c50018
[ 1007.086160]
[ 1007.087630] x27: ffff000008e0c7f8 [ 1007.090820] x26: ffff80097631ca00
[ 1007.094183]
[ 1007.095653] x25: 0000000000000001 [ 1007.098843] x24: 000000ea68b61cac
[ 1007.102206]
... or when dumped with the userpace dmesg tool, which has slightly
different implicit newline behaviour. e.g.
[ 1007.069010] pc : [<ffff00000871898c>] lr : [<ffff000008718948>] pstate: 40000145
[ 1007.076329] sp : ffff000008d53ec0
[ 1007.079606] x29: ffff000008d53ec0
[ 1007.082797] x28: 0000000080c50018
[ 1007.086160]
[ 1007.087630] x27: ffff000008e0c7f8
[ 1007.090820] x26: ffff80097631ca00
[ 1007.094183]
[ 1007.095653] x25: 0000000000000001
[ 1007.098843] x24: 000000ea68b61cac
[ 1007.102206]
We can't simply always use KERN_CONT for lines which may or may not be
continuations. That causes line prefixes (e.g. timestamps) to be
supressed, and the alignment of all but the first line will be broken.
For even more fun, we can't simply insert some dummy empty-string printk
calls, as GCC warns for an empty printk string, and even if we pass
KERN_DEFAULT explcitly to silence the warning, the prefix gets swallowed
unless there is an additional part to the string.
Instead, we must manually iterate over pairs of registers, which gives
us the legible output we want in either case, e.g.
[ 169.771790] pc : [<ffff00000871898c>] lr : [<ffff000008718948>] pstate: 40000145
[ 169.779109] sp : ffff000008d53ec0
[ 169.782386] x29: ffff000008d53ec0 x28: 0000000080c50018
[ 169.787650] x27: ffff000008e0c7f8 x26: ffff80097631de00
[ 169.792913] x25: 0000000000000001 x24: 00000027827b2cf4
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-10-20 19:23:16 +08:00
|
|
|
i = top_reg;
|
|
|
|
|
|
|
|
while (i >= 0) {
|
2012-03-05 19:49:28 +08:00
|
|
|
printk("x%-2d: %016llx ", i, regs->regs[i]);
|
arm64: fix show_regs fallout from KERN_CONT changes
Recently in commit 4bcc595ccd80decb ("printk: reinstate KERN_CONT for
printing continuation lines"), the behaviour of printk changed w.r.t.
KERN_CONT. Now, KERN_CONT is mandatory to continue existing lines.
Without this, prefixes are inserted, making output illegible, e.g.
[ 1007.069010] pc : [<ffff00000871898c>] lr : [<ffff000008718948>] pstate: 40000145
[ 1007.076329] sp : ffff000008d53ec0
[ 1007.079606] x29: ffff000008d53ec0 [ 1007.082797] x28: 0000000080c50018
[ 1007.086160]
[ 1007.087630] x27: ffff000008e0c7f8 [ 1007.090820] x26: ffff80097631ca00
[ 1007.094183]
[ 1007.095653] x25: 0000000000000001 [ 1007.098843] x24: 000000ea68b61cac
[ 1007.102206]
... or when dumped with the userpace dmesg tool, which has slightly
different implicit newline behaviour. e.g.
[ 1007.069010] pc : [<ffff00000871898c>] lr : [<ffff000008718948>] pstate: 40000145
[ 1007.076329] sp : ffff000008d53ec0
[ 1007.079606] x29: ffff000008d53ec0
[ 1007.082797] x28: 0000000080c50018
[ 1007.086160]
[ 1007.087630] x27: ffff000008e0c7f8
[ 1007.090820] x26: ffff80097631ca00
[ 1007.094183]
[ 1007.095653] x25: 0000000000000001
[ 1007.098843] x24: 000000ea68b61cac
[ 1007.102206]
We can't simply always use KERN_CONT for lines which may or may not be
continuations. That causes line prefixes (e.g. timestamps) to be
supressed, and the alignment of all but the first line will be broken.
For even more fun, we can't simply insert some dummy empty-string printk
calls, as GCC warns for an empty printk string, and even if we pass
KERN_DEFAULT explcitly to silence the warning, the prefix gets swallowed
unless there is an additional part to the string.
Instead, we must manually iterate over pairs of registers, which gives
us the legible output we want in either case, e.g.
[ 169.771790] pc : [<ffff00000871898c>] lr : [<ffff000008718948>] pstate: 40000145
[ 169.779109] sp : ffff000008d53ec0
[ 169.782386] x29: ffff000008d53ec0 x28: 0000000080c50018
[ 169.787650] x27: ffff000008e0c7f8 x26: ffff80097631de00
[ 169.792913] x25: 0000000000000001 x24: 00000027827b2cf4
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-10-20 19:23:16 +08:00
|
|
|
i--;
|
|
|
|
|
|
|
|
if (i % 2 == 0) {
|
|
|
|
pr_cont("x%-2d: %016llx ", i, regs->regs[i]);
|
|
|
|
i--;
|
|
|
|
}
|
|
|
|
|
|
|
|
pr_cont("\n");
|
2012-03-05 19:49:28 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void show_regs(struct pt_regs * regs)
|
|
|
|
{
|
|
|
|
__show_regs(regs);
|
2020-06-09 12:30:23 +08:00
|
|
|
dump_backtrace(regs, NULL, KERN_DEFAULT);
|
2012-03-05 19:49:28 +08:00
|
|
|
}
|
|
|
|
|
2014-09-11 21:38:16 +08:00
|
|
|
static void tls_thread_flush(void)
|
|
|
|
{
|
2016-09-08 20:55:38 +08:00
|
|
|
write_sysreg(0, tpidr_el0);
|
2014-09-11 21:38:16 +08:00
|
|
|
|
|
|
|
if (is_compat_task()) {
|
2018-03-28 17:50:49 +08:00
|
|
|
current->thread.uw.tp_value = 0;
|
2014-09-11 21:38:16 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We need to ensure ordering between the shadow state and the
|
|
|
|
* hardware state, so that we don't corrupt the hardware state
|
|
|
|
* with a stale shadow state during context switch.
|
|
|
|
*/
|
|
|
|
barrier();
|
2016-09-08 20:55:38 +08:00
|
|
|
write_sysreg(0, tpidrro_el0);
|
2014-09-11 21:38:16 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-07-24 01:58:39 +08:00
|
|
|
static void flush_tagged_addr_state(void)
|
|
|
|
{
|
|
|
|
if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI))
|
|
|
|
clear_thread_flag(TIF_TAGGED_ADDR);
|
|
|
|
}
|
|
|
|
|
2012-03-05 19:49:28 +08:00
|
|
|
void flush_thread(void)
|
|
|
|
{
|
|
|
|
fpsimd_flush_thread();
|
2014-09-11 21:38:16 +08:00
|
|
|
tls_thread_flush();
|
2012-03-05 19:49:28 +08:00
|
|
|
flush_ptrace_hw_breakpoint(current);
|
2019-07-24 01:58:39 +08:00
|
|
|
flush_tagged_addr_state();
|
2012-03-05 19:49:28 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void release_thread(struct task_struct *dead_task)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
arm64/sve: Core task context handling
This patch adds the core support for switching and managing the SVE
architectural state of user tasks.
Calls to the existing FPSIMD low-level save/restore functions are
factored out as new functions task_fpsimd_{save,load}(), since SVE
now dynamically may or may not need to be handled at these points
depending on the kernel configuration, hardware features discovered
at boot, and the runtime state of the task. To make these
decisions as fast as possible, const cpucaps are used where
feasible, via the system_supports_sve() helper.
The SVE registers are only tracked for threads that have explicitly
used SVE, indicated by the new thread flag TIF_SVE. Otherwise, the
FPSIMD view of the architectural state is stored in
thread.fpsimd_state as usual.
When in use, the SVE registers are not stored directly in
thread_struct due to their potentially large and variable size.
Because the task_struct slab allocator must be configured very
early during kernel boot, it is also tricky to configure it
correctly to match the maximum vector length provided by the
hardware, since this depends on examining secondary CPUs as well as
the primary. Instead, a pointer sve_state in thread_struct points
to a dynamically allocated buffer containing the SVE register data,
and code is added to allocate and free this buffer at appropriate
times.
TIF_SVE is set when taking an SVE access trap from userspace, if
suitable hardware support has been detected. This enables SVE for
the thread: a subsequent return to userspace will disable the trap
accordingly. If such a trap is taken without sufficient system-
wide hardware support, SIGILL is sent to the thread instead as if
an undefined instruction had been executed: this may happen if
userspace tries to use SVE in a system where not all CPUs support
it for example.
The kernel will clear TIF_SVE and disable SVE for the thread
whenever an explicit syscall is made by userspace. For backwards
compatibility reasons and conformance with the spirit of the base
AArch64 procedure call standard, the subset of the SVE register
state that aliases the FPSIMD registers is still preserved across a
syscall even if this happens. The remainder of the SVE register
state logically becomes zero at syscall entry, though the actual
zeroing work is currently deferred until the thread next tries to
use SVE, causing another trap to the kernel. This implementation
is suboptimal: in the future, the fastpath case may be optimised
to zero the registers in-place and leave SVE enabled for the task,
where beneficial.
TIF_SVE is also cleared in the following slowpath cases, which are
taken as reasonable hints that the task may no longer use SVE:
* exec
* fork and clone
Code is added to sync data between thread.fpsimd_state and
thread.sve_state whenever enabling/disabling SVE, in a manner
consistent with the SVE architectural programmer's model.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Alex Bennée <alex.bennee@linaro.org>
[will: added #include to fix allnoconfig build]
[will: use enable_daif in do_sve_acc]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-10-31 23:51:05 +08:00
|
|
|
void arch_release_task_struct(struct task_struct *tsk)
|
|
|
|
{
|
|
|
|
fpsimd_release_task(tsk);
|
|
|
|
}
|
|
|
|
|
2012-03-05 19:49:28 +08:00
|
|
|
int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
|
|
|
|
{
|
2015-06-11 12:04:32 +08:00
|
|
|
if (current->mm)
|
|
|
|
fpsimd_preserve_current_state();
|
2012-03-05 19:49:28 +08:00
|
|
|
*dst = *src;
|
arm64/sve: Core task context handling
This patch adds the core support for switching and managing the SVE
architectural state of user tasks.
Calls to the existing FPSIMD low-level save/restore functions are
factored out as new functions task_fpsimd_{save,load}(), since SVE
now dynamically may or may not need to be handled at these points
depending on the kernel configuration, hardware features discovered
at boot, and the runtime state of the task. To make these
decisions as fast as possible, const cpucaps are used where
feasible, via the system_supports_sve() helper.
The SVE registers are only tracked for threads that have explicitly
used SVE, indicated by the new thread flag TIF_SVE. Otherwise, the
FPSIMD view of the architectural state is stored in
thread.fpsimd_state as usual.
When in use, the SVE registers are not stored directly in
thread_struct due to their potentially large and variable size.
Because the task_struct slab allocator must be configured very
early during kernel boot, it is also tricky to configure it
correctly to match the maximum vector length provided by the
hardware, since this depends on examining secondary CPUs as well as
the primary. Instead, a pointer sve_state in thread_struct points
to a dynamically allocated buffer containing the SVE register data,
and code is added to allocate and free this buffer at appropriate
times.
TIF_SVE is set when taking an SVE access trap from userspace, if
suitable hardware support has been detected. This enables SVE for
the thread: a subsequent return to userspace will disable the trap
accordingly. If such a trap is taken without sufficient system-
wide hardware support, SIGILL is sent to the thread instead as if
an undefined instruction had been executed: this may happen if
userspace tries to use SVE in a system where not all CPUs support
it for example.
The kernel will clear TIF_SVE and disable SVE for the thread
whenever an explicit syscall is made by userspace. For backwards
compatibility reasons and conformance with the spirit of the base
AArch64 procedure call standard, the subset of the SVE register
state that aliases the FPSIMD registers is still preserved across a
syscall even if this happens. The remainder of the SVE register
state logically becomes zero at syscall entry, though the actual
zeroing work is currently deferred until the thread next tries to
use SVE, causing another trap to the kernel. This implementation
is suboptimal: in the future, the fastpath case may be optimised
to zero the registers in-place and leave SVE enabled for the task,
where beneficial.
TIF_SVE is also cleared in the following slowpath cases, which are
taken as reasonable hints that the task may no longer use SVE:
* exec
* fork and clone
Code is added to sync data between thread.fpsimd_state and
thread.sve_state whenever enabling/disabling SVE, in a manner
consistent with the SVE architectural programmer's model.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Alex Bennée <alex.bennee@linaro.org>
[will: added #include to fix allnoconfig build]
[will: use enable_daif in do_sve_acc]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-10-31 23:51:05 +08:00
|
|
|
|
2019-10-01 04:56:00 +08:00
|
|
|
/* We rely on the above assignment to initialize dst's thread_flags: */
|
|
|
|
BUILD_BUG_ON(!IS_ENABLED(CONFIG_THREAD_INFO_IN_TASK));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Detach src's sve_state (if any) from dst so that it does not
|
|
|
|
* get erroneously used or freed prematurely. dst's sve_state
|
|
|
|
* will be allocated on demand later on if dst uses SVE.
|
|
|
|
* For consistency, also clear TIF_SVE here: this could be done
|
|
|
|
* later in copy_process(), but to avoid tripping up future
|
|
|
|
* maintainers it is best not to leave TIF_SVE and sve_state in
|
|
|
|
* an inconsistent state, even temporarily.
|
|
|
|
*/
|
|
|
|
dst->thread.sve_state = NULL;
|
|
|
|
clear_tsk_thread_flag(dst, TIF_SVE);
|
|
|
|
|
2012-03-05 19:49:28 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
asmlinkage void ret_from_fork(void) asm("ret_from_fork");
|
|
|
|
|
2020-06-11 17:04:15 +08:00
|
|
|
int copy_thread(unsigned long clone_flags, unsigned long stack_start,
|
2020-01-03 01:24:08 +08:00
|
|
|
unsigned long stk_sz, struct task_struct *p, unsigned long tls)
|
2012-03-05 19:49:28 +08:00
|
|
|
{
|
|
|
|
struct pt_regs *childregs = task_pt_regs(p);
|
|
|
|
|
2012-10-05 19:31:20 +08:00
|
|
|
memset(&p->thread.cpu_context, 0, sizeof(struct cpu_context));
|
2012-03-05 19:49:28 +08:00
|
|
|
|
arm64: fpsimd: Prevent registers leaking from dead tasks
Currently, loading of a task's fpsimd state into the CPU registers
is skipped if that task's state is already present in the registers
of that CPU.
However, the code relies on the struct fpsimd_state * (and by
extension struct task_struct *) to unambiguously identify a task.
There is a particular case in which this doesn't work reliably:
when a task exits, its task_struct may be recycled to describe a
new task.
Consider the following scenario:
1) Task P loads its fpsimd state onto cpu C.
per_cpu(fpsimd_last_state, C) := P;
P->thread.fpsimd_state.cpu := C;
2) Task X is scheduled onto C and loads its fpsimd state on C.
per_cpu(fpsimd_last_state, C) := X;
X->thread.fpsimd_state.cpu := C;
3) X exits, causing X's task_struct to be freed.
4) P forks a new child T, which obtains X's recycled task_struct.
T == X.
T->thread.fpsimd_state.cpu == C (inherited from P).
5) T is scheduled on C.
T's fpsimd state is not loaded, because
per_cpu(fpsimd_last_state, C) == T (== X) &&
T->thread.fpsimd_state.cpu == C.
(This is the check performed by fpsimd_thread_switch().)
So, T gets X's registers because the last registers loaded onto C
were those of X, in (2).
This patch fixes the problem by ensuring that the sched-in check
fails in (5): fpsimd_flush_task_state(T) is called when T is
forked, so that T->thread.fpsimd_state.cpu == C cannot be true.
This relies on the fact that T is not schedulable until after
copy_thread() completes.
Once T's fpsimd state has been loaded on some CPU C there may still
be other cpus D for which per_cpu(fpsimd_last_state, D) ==
&X->thread.fpsimd_state. But D is necessarily != C in this case,
and the check in (5) must fail.
An alternative fix would be to do refcounting on task_struct. This
would result in each CPU holding a reference to the last task whose
fpsimd state was loaded there. It's not clear whether this is
preferable, and it involves higher overhead than the fix proposed
in this patch. It would also move all the task_struct freeing
work into the context switch critical section, or otherwise some
deferred cleanup mechanism would need to be introduced, neither of
which seems obviously justified.
Cc: <stable@vger.kernel.org>
Fixes: 005f78cd8849 ("arm64: defer reloading a task's FPSIMD state to userland resume")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: word-smithed the comment so it makes more sense]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-12-05 22:56:42 +08:00
|
|
|
/*
|
|
|
|
* In case p was allocated the same task_struct pointer as some
|
|
|
|
* other recently-exited task, make sure p is disassociated from
|
|
|
|
* any cpu that may have run that now-exited task recently.
|
|
|
|
* Otherwise we could erroneously skip reloading the FPSIMD
|
|
|
|
* registers for p.
|
|
|
|
*/
|
|
|
|
fpsimd_flush_task_state(p);
|
|
|
|
|
2020-03-13 17:04:56 +08:00
|
|
|
ptrauth_thread_init_kernel(p);
|
|
|
|
|
2012-10-22 03:56:52 +08:00
|
|
|
if (likely(!(p->flags & PF_KTHREAD))) {
|
|
|
|
*childregs = *current_pt_regs();
|
2012-10-05 19:31:20 +08:00
|
|
|
childregs->regs[0] = 0;
|
2015-05-27 22:39:40 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Read the current TLS pointer from tpidr_el0 as it may be
|
|
|
|
* out-of-sync with the saved value.
|
|
|
|
*/
|
2016-09-08 20:55:38 +08:00
|
|
|
*task_user_tls(p) = read_sysreg(tpidr_el0);
|
2015-05-27 22:39:40 +08:00
|
|
|
|
|
|
|
if (stack_start) {
|
|
|
|
if (is_compat_thread(task_thread_info(p)))
|
2012-10-18 12:55:54 +08:00
|
|
|
childregs->compat_sp = stack_start;
|
2015-05-27 22:39:40 +08:00
|
|
|
else
|
2012-10-18 12:55:54 +08:00
|
|
|
childregs->sp = stack_start;
|
2012-10-05 19:31:20 +08:00
|
|
|
}
|
2015-05-27 22:39:40 +08:00
|
|
|
|
2012-03-05 19:49:28 +08:00
|
|
|
/*
|
2020-01-03 01:24:08 +08:00
|
|
|
* If a TLS pointer was passed to clone, use it for the new
|
|
|
|
* thread.
|
2012-03-05 19:49:28 +08:00
|
|
|
*/
|
2012-10-05 19:31:20 +08:00
|
|
|
if (clone_flags & CLONE_SETTLS)
|
2020-01-03 01:24:08 +08:00
|
|
|
p->thread.uw.tp_value = tls;
|
2012-10-05 19:31:20 +08:00
|
|
|
} else {
|
|
|
|
memset(childregs, 0, sizeof(struct pt_regs));
|
|
|
|
childregs->pstate = PSR_MODE_EL1h;
|
2016-02-05 22:58:48 +08:00
|
|
|
if (IS_ENABLED(CONFIG_ARM64_UAO) &&
|
2016-11-08 21:56:20 +08:00
|
|
|
cpus_have_const_cap(ARM64_HAS_UAO))
|
2016-02-05 22:58:48 +08:00
|
|
|
childregs->pstate |= PSR_UAO_BIT;
|
2018-08-07 20:47:06 +08:00
|
|
|
|
2020-09-18 18:54:33 +08:00
|
|
|
spectre_v4_enable_task_mitigation(p);
|
2018-08-07 20:47:06 +08:00
|
|
|
|
2019-01-31 22:58:46 +08:00
|
|
|
if (system_uses_irq_prio_masking())
|
|
|
|
childregs->pmr_save = GIC_PRIO_IRQON;
|
|
|
|
|
2012-10-05 19:31:20 +08:00
|
|
|
p->thread.cpu_context.x19 = stack_start;
|
|
|
|
p->thread.cpu_context.x20 = stk_sz;
|
2012-03-05 19:49:28 +08:00
|
|
|
}
|
|
|
|
p->thread.cpu_context.pc = (unsigned long)ret_from_fork;
|
2012-10-05 19:31:20 +08:00
|
|
|
p->thread.cpu_context.sp = (unsigned long)childregs;
|
2012-03-05 19:49:28 +08:00
|
|
|
|
|
|
|
ptrace_hw_copy_thread(p);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-06-21 23:00:44 +08:00
|
|
|
void tls_preserve_current_state(void)
|
|
|
|
{
|
|
|
|
*task_user_tls(current) = read_sysreg(tpidr_el0);
|
|
|
|
}
|
|
|
|
|
2012-03-05 19:49:28 +08:00
|
|
|
static void tls_thread_switch(struct task_struct *next)
|
|
|
|
{
|
2017-06-21 23:00:44 +08:00
|
|
|
tls_preserve_current_state();
|
2012-03-05 19:49:28 +08:00
|
|
|
|
2017-11-14 22:33:28 +08:00
|
|
|
if (is_compat_thread(task_thread_info(next)))
|
2018-03-28 17:50:49 +08:00
|
|
|
write_sysreg(next->thread.uw.tp_value, tpidrro_el0);
|
2017-11-14 22:33:28 +08:00
|
|
|
else if (!arm64_kernel_unmapped_at_el0())
|
|
|
|
write_sysreg(0, tpidrro_el0);
|
2012-03-05 19:49:28 +08:00
|
|
|
|
2017-11-14 22:33:28 +08:00
|
|
|
write_sysreg(*task_user_tls(next), tpidr_el0);
|
2012-03-05 19:49:28 +08:00
|
|
|
}
|
|
|
|
|
2016-02-05 22:58:48 +08:00
|
|
|
/* Restore the UAO state depending on next's addr_limit */
|
2016-10-18 18:27:48 +08:00
|
|
|
void uao_thread_switch(struct task_struct *next)
|
2016-02-05 22:58:48 +08:00
|
|
|
{
|
2016-02-18 23:50:04 +08:00
|
|
|
if (IS_ENABLED(CONFIG_ARM64_UAO)) {
|
|
|
|
if (task_thread_info(next)->addr_limit == KERNEL_DS)
|
|
|
|
asm(ALTERNATIVE("nop", SET_PSTATE_UAO(1), ARM64_HAS_UAO));
|
|
|
|
else
|
|
|
|
asm(ALTERNATIVE("nop", SET_PSTATE_UAO(0), ARM64_HAS_UAO));
|
|
|
|
}
|
2016-02-05 22:58:48 +08:00
|
|
|
}
|
|
|
|
|
2019-07-22 21:53:09 +08:00
|
|
|
/*
|
|
|
|
* Force SSBS state on context-switch, since it may be lost after migrating
|
|
|
|
* from a CPU which treats the bit as RES0 in a heterogeneous system.
|
|
|
|
*/
|
|
|
|
static void ssbs_thread_switch(struct task_struct *next)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Nothing to do for kernel threads, but 'regs' may be junk
|
|
|
|
* (e.g. idle task) so check the flags and bail early.
|
|
|
|
*/
|
|
|
|
if (unlikely(next->flags & PF_KTHREAD))
|
|
|
|
return;
|
|
|
|
|
2020-02-06 18:42:58 +08:00
|
|
|
/*
|
|
|
|
* If all CPUs implement the SSBS extension, then we just need to
|
|
|
|
* context-switch the PSTATE field.
|
|
|
|
*/
|
2020-09-18 18:54:33 +08:00
|
|
|
if (cpus_have_const_cap(ARM64_SSBS))
|
2019-07-22 21:53:09 +08:00
|
|
|
return;
|
|
|
|
|
2020-09-18 18:54:33 +08:00
|
|
|
spectre_v4_enable_task_mitigation(next);
|
2019-07-22 21:53:09 +08:00
|
|
|
}
|
|
|
|
|
arm64: split thread_info from task stack
This patch moves arm64's struct thread_info from the task stack into
task_struct. This protects thread_info from corruption in the case of
stack overflows, and makes its address harder to determine if stack
addresses are leaked, making a number of attacks more difficult. Precise
detection and handling of overflow is left for subsequent patches.
Largely, this involves changing code to store the task_struct in sp_el0,
and acquire the thread_info from the task struct. Core code now
implements current_thread_info(), and as noted in <linux/sched.h> this
relies on offsetof(task_struct, thread_info) == 0, enforced by core
code.
This change means that the 'tsk' register used in entry.S now points to
a task_struct, rather than a thread_info as it used to. To make this
clear, the TI_* field offsets are renamed to TSK_TI_*, with asm-offsets
appropriately updated to account for the structural change.
Userspace clobbers sp_el0, and we can no longer restore this from the
stack. Instead, the current task is cached in a per-cpu variable that we
can safely access from early assembly as interrupts are disabled (and we
are thus not preemptible).
Both secondary entry and idle are updated to stash the sp and task
pointer separately.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: James Morse <james.morse@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-11-04 04:23:13 +08:00
|
|
|
/*
|
|
|
|
* We store our current task in sp_el0, which is clobbered by userspace. Keep a
|
|
|
|
* shadow copy so that we can restore this upon entry from userspace.
|
|
|
|
*
|
|
|
|
* This is *only* for exception entry from EL0, and is not valid until we
|
|
|
|
* __switch_to() a user task.
|
|
|
|
*/
|
|
|
|
DEFINE_PER_CPU(struct task_struct *, __entry_task);
|
|
|
|
|
|
|
|
static void entry_task_switch(struct task_struct *next)
|
|
|
|
{
|
|
|
|
__this_cpu_write(__entry_task, next);
|
|
|
|
}
|
|
|
|
|
2020-08-01 01:38:23 +08:00
|
|
|
/*
|
|
|
|
* ARM erratum 1418040 handling, affecting the 32bit view of CNTVCT.
|
|
|
|
* Assuming the virtual counter is enabled at the beginning of times:
|
|
|
|
*
|
|
|
|
* - disable access when switching from a 64bit task to a 32bit task
|
|
|
|
* - enable access when switching from a 32bit task to a 64bit task
|
|
|
|
*/
|
|
|
|
static void erratum_1418040_thread_switch(struct task_struct *prev,
|
|
|
|
struct task_struct *next)
|
|
|
|
{
|
|
|
|
bool prev32, next32;
|
|
|
|
u64 val;
|
|
|
|
|
|
|
|
if (!(IS_ENABLED(CONFIG_ARM64_ERRATUM_1418040) &&
|
|
|
|
cpus_have_const_cap(ARM64_WORKAROUND_1418040)))
|
|
|
|
return;
|
|
|
|
|
|
|
|
prev32 = is_compat_thread(task_thread_info(prev));
|
|
|
|
next32 = is_compat_thread(task_thread_info(next));
|
|
|
|
|
|
|
|
if (prev32 == next32)
|
|
|
|
return;
|
|
|
|
|
|
|
|
val = read_sysreg(cntkctl_el1);
|
|
|
|
|
|
|
|
if (!next32)
|
|
|
|
val |= ARCH_TIMER_USR_VCT_ACCESS_EN;
|
|
|
|
else
|
|
|
|
val &= ~ARCH_TIMER_USR_VCT_ACCESS_EN;
|
|
|
|
|
|
|
|
write_sysreg(val, cntkctl_el1);
|
|
|
|
}
|
|
|
|
|
2012-03-05 19:49:28 +08:00
|
|
|
/*
|
|
|
|
* Thread switching.
|
|
|
|
*/
|
2016-12-22 06:44:46 +08:00
|
|
|
__notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
|
2012-03-05 19:49:28 +08:00
|
|
|
struct task_struct *next)
|
|
|
|
{
|
|
|
|
struct task_struct *last;
|
|
|
|
|
|
|
|
fpsimd_thread_switch(next);
|
|
|
|
tls_thread_switch(next);
|
|
|
|
hw_breakpoint_thread_switch(next);
|
2013-04-04 02:01:01 +08:00
|
|
|
contextidr_thread_switch(next);
|
arm64: split thread_info from task stack
This patch moves arm64's struct thread_info from the task stack into
task_struct. This protects thread_info from corruption in the case of
stack overflows, and makes its address harder to determine if stack
addresses are leaked, making a number of attacks more difficult. Precise
detection and handling of overflow is left for subsequent patches.
Largely, this involves changing code to store the task_struct in sp_el0,
and acquire the thread_info from the task struct. Core code now
implements current_thread_info(), and as noted in <linux/sched.h> this
relies on offsetof(task_struct, thread_info) == 0, enforced by core
code.
This change means that the 'tsk' register used in entry.S now points to
a task_struct, rather than a thread_info as it used to. To make this
clear, the TI_* field offsets are renamed to TSK_TI_*, with asm-offsets
appropriately updated to account for the structural change.
Userspace clobbers sp_el0, and we can no longer restore this from the
stack. Instead, the current task is cached in a per-cpu variable that we
can safely access from early assembly as interrupts are disabled (and we
are thus not preemptible).
Both secondary entry and idle are updated to stash the sp and task
pointer separately.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: James Morse <james.morse@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-11-04 04:23:13 +08:00
|
|
|
entry_task_switch(next);
|
2016-02-05 22:58:48 +08:00
|
|
|
uao_thread_switch(next);
|
2019-07-22 21:53:09 +08:00
|
|
|
ssbs_thread_switch(next);
|
2020-08-01 01:38:23 +08:00
|
|
|
erratum_1418040_thread_switch(prev, next);
|
2012-03-05 19:49:28 +08:00
|
|
|
|
2013-04-24 21:47:02 +08:00
|
|
|
/*
|
|
|
|
* Complete any pending TLB or cache maintenance on this CPU in case
|
|
|
|
* the thread migrates to a different CPU.
|
membarrier: Provide expedited private command
Implement MEMBARRIER_CMD_PRIVATE_EXPEDITED with IPIs using cpumask built
from all runqueues for which current thread's mm is the same as the
thread calling sys_membarrier. It executes faster than the non-expedited
variant (no blocking). It also works on NOHZ_FULL configurations.
Scheduler-wise, it requires a memory barrier before and after context
switching between processes (which have different mm). The memory
barrier before context switch is already present. For the barrier after
context switch:
* Our TSO archs can do RELEASE without being a full barrier. Look at
x86 spin_unlock() being a regular STORE for example. But for those
archs, all atomics imply smp_mb and all of them have atomic ops in
switch_mm() for mm_cpumask(), and on x86 the CR3 load acts as a full
barrier.
* From all weakly ordered machines, only ARM64 and PPC can do RELEASE,
the rest does indeed do smp_mb(), so there the spin_unlock() is a full
barrier and we're good.
* ARM64 has a very heavy barrier in switch_to(), which suffices.
* PPC just removed its barrier from switch_to(), but appears to be
talking about adding something to switch_mm(). So add a
smp_mb__after_unlock_lock() for now, until this is settled on the PPC
side.
Changes since v3:
- Properly document the memory barriers provided by each architecture.
Changes since v2:
- Address comments from Peter Zijlstra,
- Add smp_mb__after_unlock_lock() after finish_lock_switch() in
finish_task_switch() to add the memory barrier we need after storing
to rq->curr. This is much simpler than the previous approach relying
on atomic_dec_and_test() in mmdrop(), which actually added a memory
barrier in the common case of switching between userspace processes.
- Return -EINVAL when MEMBARRIER_CMD_SHARED is used on a nohz_full
kernel, rather than having the whole membarrier system call returning
-ENOSYS. Indeed, CMD_PRIVATE_EXPEDITED is compatible with nohz_full.
Adapt the CMD_QUERY mask accordingly.
Changes since v1:
- move membarrier code under kernel/sched/ because it uses the
scheduler runqueue,
- only add the barrier when we switch from a kernel thread. The case
where we switch from a user-space thread is already handled by
the atomic_dec_and_test() in mmdrop().
- add a comment to mmdrop() documenting the requirement on the implicit
memory barrier.
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: gromer@google.com
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Dave Watson <davejwatson@fb.com>
2017-07-29 04:40:40 +08:00
|
|
|
* This full barrier is also required by the membarrier system
|
|
|
|
* call.
|
2013-04-24 21:47:02 +08:00
|
|
|
*/
|
2014-05-02 23:24:10 +08:00
|
|
|
dsb(ish);
|
2012-03-05 19:49:28 +08:00
|
|
|
|
|
|
|
/* the actual thread switch */
|
|
|
|
last = cpu_switch_to(prev, next);
|
|
|
|
|
|
|
|
return last;
|
|
|
|
}
|
|
|
|
|
|
|
|
unsigned long get_wchan(struct task_struct *p)
|
|
|
|
{
|
|
|
|
struct stackframe frame;
|
2016-11-04 04:23:08 +08:00
|
|
|
unsigned long stack_page, ret = 0;
|
2012-03-05 19:49:28 +08:00
|
|
|
int count = 0;
|
|
|
|
if (!p || p == current || p->state == TASK_RUNNING)
|
|
|
|
return 0;
|
|
|
|
|
2016-11-04 04:23:08 +08:00
|
|
|
stack_page = (unsigned long)try_get_task_stack(p);
|
|
|
|
if (!stack_page)
|
|
|
|
return 0;
|
|
|
|
|
2019-07-02 21:07:28 +08:00
|
|
|
start_backtrace(&frame, thread_saved_fp(p), thread_saved_pc(p));
|
|
|
|
|
2012-03-05 19:49:28 +08:00
|
|
|
do {
|
2017-07-23 16:05:38 +08:00
|
|
|
if (unwind_frame(p, &frame))
|
2016-11-04 04:23:08 +08:00
|
|
|
goto out;
|
|
|
|
if (!in_sched_functions(frame.pc)) {
|
|
|
|
ret = frame.pc;
|
|
|
|
goto out;
|
|
|
|
}
|
2012-03-05 19:49:28 +08:00
|
|
|
} while (count ++ < 16);
|
2016-11-04 04:23:08 +08:00
|
|
|
|
|
|
|
out:
|
|
|
|
put_task_stack(p);
|
|
|
|
return ret;
|
2012-03-05 19:49:28 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
unsigned long arch_align_stack(unsigned long sp)
|
|
|
|
{
|
|
|
|
if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
|
|
|
|
sp -= get_random_int() & ~PAGE_MASK;
|
|
|
|
return sp & ~0xf;
|
|
|
|
}
|
|
|
|
|
2017-08-20 18:20:48 +08:00
|
|
|
/*
|
|
|
|
* Called from setup_new_exec() after (COMPAT_)SET_PERSONALITY.
|
|
|
|
*/
|
|
|
|
void arch_setup_new_exec(void)
|
|
|
|
{
|
|
|
|
current->mm->context.flags = is_compat_task() ? MMCF_AARCH32 : 0;
|
arm64: add basic pointer authentication support
This patch adds basic support for pointer authentication, allowing
userspace to make use of APIAKey, APIBKey, APDAKey, APDBKey, and
APGAKey. The kernel maintains key values for each process (shared by all
threads within), which are initialised to random values at exec() time.
The ID_AA64ISAR1_EL1.{APA,API,GPA,GPI} fields are exposed to userspace,
to describe that pointer authentication instructions are available and
that the kernel is managing the keys. Two new hwcaps are added for the
same reason: PACA (for address authentication) and PACG (for generic
authentication).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Tested-by: Adam Wallis <awallis@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
[will: Fix sizeof() usage and unroll address key initialisation]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-08 02:39:25 +08:00
|
|
|
|
|
|
|
ptrauth_thread_init_user(current);
|
2017-08-20 18:20:48 +08:00
|
|
|
}
|
2019-07-24 01:58:39 +08:00
|
|
|
|
|
|
|
#ifdef CONFIG_ARM64_TAGGED_ADDR_ABI
|
|
|
|
/*
|
|
|
|
* Control the relaxed ABI allowing tagged user addresses into the kernel.
|
|
|
|
*/
|
2019-08-15 23:44:01 +08:00
|
|
|
static unsigned int tagged_addr_disabled;
|
2019-07-24 01:58:39 +08:00
|
|
|
|
|
|
|
long set_tagged_addr_ctrl(unsigned long arg)
|
|
|
|
{
|
|
|
|
if (is_compat_task())
|
|
|
|
return -EINVAL;
|
|
|
|
if (arg & ~PR_TAGGED_ADDR_ENABLE)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2019-08-15 23:44:01 +08:00
|
|
|
/*
|
|
|
|
* Do not allow the enabling of the tagged address ABI if globally
|
|
|
|
* disabled via sysctl abi.tagged_addr_disabled.
|
|
|
|
*/
|
|
|
|
if (arg & PR_TAGGED_ADDR_ENABLE && tagged_addr_disabled)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2019-07-24 01:58:39 +08:00
|
|
|
update_thread_flag(TIF_TAGGED_ADDR, arg & PR_TAGGED_ADDR_ENABLE);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
long get_tagged_addr_ctrl(void)
|
|
|
|
{
|
|
|
|
if (is_compat_task())
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (test_thread_flag(TIF_TAGGED_ADDR))
|
|
|
|
return PR_TAGGED_ADDR_ENABLE;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Global sysctl to disable the tagged user addresses support. This control
|
|
|
|
* only prevents the tagged address ABI enabling via prctl() and does not
|
|
|
|
* disable it for tasks that already opted in to the relaxed ABI.
|
|
|
|
*/
|
|
|
|
|
|
|
|
static struct ctl_table tagged_addr_sysctl_table[] = {
|
|
|
|
{
|
2019-08-15 23:44:01 +08:00
|
|
|
.procname = "tagged_addr_disabled",
|
2019-07-24 01:58:39 +08:00
|
|
|
.mode = 0644,
|
2019-08-15 23:44:01 +08:00
|
|
|
.data = &tagged_addr_disabled,
|
2019-07-24 01:58:39 +08:00
|
|
|
.maxlen = sizeof(int),
|
|
|
|
.proc_handler = proc_dointvec_minmax,
|
2020-01-24 23:51:27 +08:00
|
|
|
.extra1 = SYSCTL_ZERO,
|
|
|
|
.extra2 = SYSCTL_ONE,
|
2019-07-24 01:58:39 +08:00
|
|
|
},
|
|
|
|
{ }
|
|
|
|
};
|
|
|
|
|
|
|
|
static int __init tagged_addr_init(void)
|
|
|
|
{
|
|
|
|
if (!register_sysctl("abi", tagged_addr_sysctl_table))
|
|
|
|
return -EINVAL;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
core_initcall(tagged_addr_init);
|
|
|
|
#endif /* CONFIG_ARM64_TAGGED_ADDR_ABI */
|
arm64: entry.S: Do not preempt from IRQ before all cpufeatures are enabled
Preempting from IRQ-return means that the task has its PSTATE saved
on the stack, which will get restored when the task is resumed and does
the actual IRQ return.
However, enabling some CPU features requires modifying the PSTATE. This
means that, if a task was scheduled out during an IRQ-return before all
CPU features are enabled, the task might restore a PSTATE that does not
include the feature enablement changes once scheduled back in.
* Task 1:
PAN == 0 ---| |---------------
| |<- return from IRQ, PSTATE.PAN = 0
| <- IRQ |
+--------+ <- preempt() +--
^
|
reschedule Task 1, PSTATE.PAN == 1
* Init:
--------------------+------------------------
^
|
enable_cpu_features
set PSTATE.PAN on all CPUs
Worse than this, since PSTATE is untouched when task switching is done,
a task missing the new bits in PSTATE might affect another task, if both
do direct calls to schedule() (outside of IRQ/exception contexts).
Fix this by preventing preemption on IRQ-return until features are
enabled on all CPUs.
This way the only PSTATE values that are saved on the stack are from
synchronous exceptions. These are expected to be fatal this early, the
exception is BRK for WARN_ON(), but as this uses do_debug_exception()
which keeps IRQs masked, it shouldn't call schedule().
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
[james: Replaced a really cool hack, with an even simpler static key in C.
expanded commit message with Julien's cover-letter ascii art]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2019-10-16 01:25:44 +08:00
|
|
|
|
|
|
|
asmlinkage void __sched arm64_preempt_schedule_irq(void)
|
|
|
|
{
|
|
|
|
lockdep_assert_irqs_disabled();
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Preempting a task from an IRQ means we leave copies of PSTATE
|
|
|
|
* on the stack. cpufeature's enable calls may modify PSTATE, but
|
|
|
|
* resuming one of these preempted tasks would undo those changes.
|
|
|
|
*
|
|
|
|
* Only allow a task to be preempted once cpufeatures have been
|
|
|
|
* enabled.
|
|
|
|
*/
|
2020-01-14 07:30:17 +08:00
|
|
|
if (system_capabilities_finalized())
|
arm64: entry.S: Do not preempt from IRQ before all cpufeatures are enabled
Preempting from IRQ-return means that the task has its PSTATE saved
on the stack, which will get restored when the task is resumed and does
the actual IRQ return.
However, enabling some CPU features requires modifying the PSTATE. This
means that, if a task was scheduled out during an IRQ-return before all
CPU features are enabled, the task might restore a PSTATE that does not
include the feature enablement changes once scheduled back in.
* Task 1:
PAN == 0 ---| |---------------
| |<- return from IRQ, PSTATE.PAN = 0
| <- IRQ |
+--------+ <- preempt() +--
^
|
reschedule Task 1, PSTATE.PAN == 1
* Init:
--------------------+------------------------
^
|
enable_cpu_features
set PSTATE.PAN on all CPUs
Worse than this, since PSTATE is untouched when task switching is done,
a task missing the new bits in PSTATE might affect another task, if both
do direct calls to schedule() (outside of IRQ/exception contexts).
Fix this by preventing preemption on IRQ-return until features are
enabled on all CPUs.
This way the only PSTATE values that are saved on the stack are from
synchronous exceptions. These are expected to be fatal this early, the
exception is BRK for WARN_ON(), but as this uses do_debug_exception()
which keeps IRQs masked, it shouldn't call schedule().
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
[james: Replaced a really cool hack, with an even simpler static key in C.
expanded commit message with Julien's cover-letter ascii art]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2019-10-16 01:25:44 +08:00
|
|
|
preempt_schedule_irq();
|
|
|
|
}
|
2020-03-17 00:50:47 +08:00
|
|
|
|
|
|
|
#ifdef CONFIG_BINFMT_ELF
|
|
|
|
int arch_elf_adjust_prot(int prot, const struct arch_elf_state *state,
|
|
|
|
bool has_interp, bool is_interp)
|
|
|
|
{
|
2020-03-24 01:01:19 +08:00
|
|
|
/*
|
|
|
|
* For dynamically linked executables the interpreter is
|
|
|
|
* responsible for setting PROT_BTI on everything except
|
|
|
|
* itself.
|
|
|
|
*/
|
2020-03-17 00:50:47 +08:00
|
|
|
if (is_interp != has_interp)
|
|
|
|
return prot;
|
|
|
|
|
|
|
|
if (!(state->flags & ARM64_ELF_BTI))
|
|
|
|
return prot;
|
|
|
|
|
|
|
|
if (prot & PROT_EXEC)
|
|
|
|
prot |= PROT_BTI;
|
|
|
|
|
|
|
|
return prot;
|
|
|
|
}
|
|
|
|
#endif
|