Commit Graph

1122399 Commits

Author SHA1 Message Date
Lukas Bulwahn 5f4853e810 MAINTAINERS: rectify file entry in ALIBABA PMU DRIVER
Commit cf7b61073e ("drivers/perf: add DDR Sub-System Driveway PMU driver
for Yitian 710 SoC") adds the DDR Sub-System Driveway PMU driver here:

  drivers/perf/alibaba_uncore_drw_pmu.c

The file entry in MAINTAINERS for the ALIBABA PMU DRIVER, introduced with
commit d813a19e7d ("MAINTAINERS: add maintainers for Alibaba' T-Head PMU
driver"), however refers to:

  drivers/perf/alibaba_uncore_dwr_pmu.c

Note the swapping of characters.

Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a
broken file pattern.

Repair this file entry in ALIBABA PMU DRIVER.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220929122937.20132-1-lukas.bulwahn@gmail.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-10-07 14:48:03 +01:00
Geert Uytterhoeven e08d07dd9f drivers/perf: ALIBABA_UNCORE_DRW_PMU should depend on ACPI
The Alibaba T-Head Yitian 710 DDR Sub-system Driveway PMU driver relies
solely on ACPI for matching.  Hence add a dependency on ACPI, to prevent
asking the user about this driver when configuring a kernel without ACPI
support.

Fixes: cf7b61073e ("drivers/perf: add DDR Sub-System Driveway PMU driver for Yitian 710 SoC")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/2a4407bb598285660fa5e604e56823ddb12bb0aa.1664285774.git.geert+renesas@glider.be
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-10-07 14:47:44 +01:00
Sun Ke ad0112f2d5 drivers/perf: fix return value check in ali_drw_pmu_probe()
In case of error, devm_ioremap_resource() returns ERR_PTR(),
and never returns NULL. The NULL test in the return value
check should be replaced with IS_ERR().

Fixes: cf7b61073e ("drivers/perf: add DDR Sub-System Driveway PMU driver for Yitian 710 SoC")
Signed-off-by: Sun Ke <sunke32@huawei.com>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220924032127.313156-1-sunke32@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-10-07 14:47:38 +01:00
James Morse 171df58028 arm64: errata: Add Cortex-A55 to the repeat tlbi list
Cortex-A55 is affected by an erratum where in rare circumstances the
CPUs may not handle a race between a break-before-make sequence on one
CPU, and another CPU accessing the same page. This could allow a store
to a page that has been unmapped.

Work around this by adding the affected CPUs to the list that needs
TLB sequences to be done twice.

Signed-off-by: James Morse <james.morse@arm.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220930131959.3082594-1-james.morse@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-10-07 14:42:20 +01:00
Mark Brown e1567b4f0e arm64/sysreg: Fix typo in SCTR_EL1.SPINTMASK
SPINTMASK was typoed as SPINMASK, fix it.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221005181642.711734-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-10-07 14:30:11 +01:00
Nathan Chancellor d2995249a2 arm64: alternatives: Use vdso/bits.h instead of linux/bits.h
When building with CONFIG_LTO after commit ba00c2a04f ("arm64: fix the
build with binutils 2.27"), the following build error occurs:

  In file included from arch/arm64/kernel/module-plts.c:6:
  In file included from include/linux/elf.h:6:
  In file included from arch/arm64/include/asm/elf.h:8:
  In file included from arch/arm64/include/asm/hwcap.h:9:
  In file included from arch/arm64/include/asm/cpufeature.h:9:
  In file included from arch/arm64/include/asm/alternative-macros.h:5:
  In file included from include/linux/bits.h:22:
  In file included from include/linux/build_bug.h:5:
  In file included from include/linux/compiler.h:248:
  In file included from arch/arm64/include/asm/rwonce.h:71:
  include/asm-generic/rwonce.h:67:9: error: expected string literal in 'asm'
          return __READ_ONCE(*(unsigned long *)addr);
                ^
  arch/arm64/include/asm/rwonce.h:43:16: note: expanded from macro '__READ_ONCE'
                  asm volatile(__LOAD_RCPC(b, %w0, %1)                    \
                              ^
  arch/arm64/include/asm/rwonce.h:17:2: note: expanded from macro '__LOAD_RCPC'
          ALTERNATIVE(                                                    \
          ^

Similar to the issue resolved by commit 0072dc1b53 ("arm64: avoid
BUILD_BUG_ON() in alternative-macros"), there is a circular include
dependency through <linux/bits.h> when CONFIG_LTO is enabled due to
<asm/rwonce.h> appearing in the include chain before the contents of
<asm/alternative-macros.h>, which results in ALTERNATIVE() not getting
expanded properly because it has not been defined yet.

Avoid this issue by including <vdso/bits.h>, which includes the
definition of the BIT() macro, instead of <linux/bits.h>, as BIT() is the
only macro from bits.h that is relevant to this header.

Fixes: ba00c2a04f ("arm64: fix the build with binutils 2.27")
Link: https://github.com/ClangBuiltLinux/linux/issues/1728
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221003193759.1141709-1-nathan@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-10-05 10:44:44 +01:00
Catalin Marinas 53630a1f61 Merge branch 'for-next/misc' into for-next/core
* for-next/misc:
  : Miscellaneous patches
  arm64/kprobe: Optimize the performance of patching single-step slot
  ARM64: reloc_test: add __init/__exit annotations to module init/exit funcs
  arm64/mm: fold check for KFENCE into can_set_direct_map()
  arm64: uaccess: simplify uaccess_mask_ptr()
  arm64: mte: move register initialization to C
  arm64: mm: handle ARM64_KERNEL_USES_PMD_MAPS in vmemmap_populate()
  arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()
  arm64: support huge vmalloc mappings
  arm64: spectre: increase parameters that can be used to turn off bhb mitigation individually
  arm64: run softirqs on the per-CPU IRQ stack
  arm64: compat: Implement misalignment fixups for multiword loads
2022-09-30 09:18:26 +01:00
Catalin Marinas c704cf27a1 Merge branch 'for-next/alternatives' into for-next/core
* for-next/alternatives:
  : Alternatives (code patching) improvements
  arm64: fix the build with binutils 2.27
  arm64: avoid BUILD_BUG_ON() in alternative-macros
  arm64: alternatives: add shared NOP callback
  arm64: alternatives: add alternative_has_feature_*()
  arm64: alternatives: have callbacks take a cap
  arm64: alternatives: make alt_region const
  arm64: alternatives: hoist print out of __apply_alternatives()
  arm64: alternatives: proton-pack: prepare for cap changes
  arm64: alternatives: kvm: prepare for cap changes
  arm64: cpufeature: make cpus_have_cap() noinstr-safe
2022-09-30 09:18:22 +01:00
Catalin Marinas c397623262 Merge branch 'for-next/kselftest' into for-next/core
* for-next/kselftest: (28 commits)
  : Kselftest updates for arm64
  kselftest/arm64: Handle EINTR while reading data from children
  kselftest/arm64: Flag fp-stress as exiting when we begin finishing up
  kselftest/arm64: Don't repeat termination handler for fp-stress
  kselftest/arm64: Don't enable v8.5 for MTE selftest builds
  kselftest/arm64: Fix typo in hwcap check
  kselftest/arm64: Add hwcap test for RNG
  kselftest/arm64: Add SVE 2 to the tested hwcaps
  kselftest/arm64: Add missing newline in hwcap output
  kselftest/arm64: Fix spelling misakes of signal names
  kselftest/arm64: Enforce actual ABI for SVE syscalls
  kselftest/arm64: Correct buffer allocation for SVE Z registers
  kselftest/arm64: Include larger SVE and SME VLs in signal tests
  kselftest/arm64: Allow larger buffers in get_signal_context()
  kselftest/arm64: Preserve any EXTRA_CONTEXT in handle_signal_copyctx()
  kselftest/arm64: Validate contents of EXTRA_CONTEXT blocks
  kselftest/arm64: Only validate each signal context once
  kselftest/arm64: Remove unneeded protype for validate_extra_context()
  kselftest/arm64: Fix validation of EXTRA_CONTEXT signal context location
  kselftest/arm64: Fix validatation termination record after EXTRA_CONTEXT
  kselftest/arm64: Validate signal ucontext in place
  ...
2022-09-30 09:18:11 +01:00
Catalin Marinas b23ec74cbd Merge branches 'for-next/doc', 'for-next/sve', 'for-next/sysreg', 'for-next/gettimeofday', 'for-next/stacktrace', 'for-next/atomics', 'for-next/el1-exceptions', 'for-next/a510-erratum-2658417', 'for-next/defconfig', 'for-next/tpidr2_el0' and 'for-next/ftrace', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
  arm64: asm/perf_regs.h: Avoid C++-style comment in UAPI header
  arm64/sve: Add Perf extensions documentation
  perf: arm64: Add SVE vector granule register to user regs
  MAINTAINERS: add maintainers for Alibaba' T-Head PMU driver
  drivers/perf: add DDR Sub-System Driveway PMU driver for Yitian 710 SoC
  docs: perf: Add description for Alibaba's T-Head PMU driver

* for-next/doc:
  : Documentation/arm64 updates
  arm64/sve: Document our actual ABI for clearing registers on syscall

* for-next/sve:
  : SVE updates
  arm64/sysreg: Add hwcap for SVE EBF16

* for-next/sysreg: (35 commits)
  : arm64 system registers generation (more conversions)
  arm64/sysreg: Fix a few missed conversions
  arm64/sysreg: Convert ID_AA64AFRn_EL1 to automatic generation
  arm64/sysreg: Convert ID_AA64DFR1_EL1 to automatic generation
  arm64/sysreg: Convert ID_AA64FDR0_EL1 to automatic generation
  arm64/sysreg: Use feature numbering for PMU and SPE revisions
  arm64/sysreg: Add _EL1 into ID_AA64DFR0_EL1 definition names
  arm64/sysreg: Align field names in ID_AA64DFR0_EL1 with architecture
  arm64/sysreg: Add defintion for ALLINT
  arm64/sysreg: Convert SCXTNUM_EL1 to automatic generation
  arm64/sysreg: Convert TIPDR_EL1 to automatic generation
  arm64/sysreg: Convert ID_AA64PFR1_EL1 to automatic generation
  arm64/sysreg: Convert ID_AA64PFR0_EL1 to automatic generation
  arm64/sysreg: Convert ID_AA64MMFR2_EL1 to automatic generation
  arm64/sysreg: Convert ID_AA64MMFR1_EL1 to automatic generation
  arm64/sysreg: Convert ID_AA64MMFR0_EL1 to automatic generation
  arm64/sysreg: Convert HCRX_EL2 to automatic generation
  arm64/sysreg: Standardise naming of ID_AA64PFR1_EL1 SME enumeration
  arm64/sysreg: Standardise naming of ID_AA64PFR1_EL1 BTI enumeration
  arm64/sysreg: Standardise naming of ID_AA64PFR1_EL1 fractional version fields
  arm64/sysreg: Standardise naming for MTE feature enumeration
  ...

* for-next/gettimeofday:
  : Use self-synchronising counter access in gettimeofday() (if FEAT_ECV)
  arm64: vdso: use SYS_CNTVCTSS_EL0 for gettimeofday
  arm64: alternative: patch alternatives in the vDSO
  arm64: module: move find_section to header

* for-next/stacktrace:
  : arm64 stacktrace cleanups and improvements
  arm64: stacktrace: track hyp stacks in unwinder's address space
  arm64: stacktrace: track all stack boundaries explicitly
  arm64: stacktrace: remove stack type from fp translator
  arm64: stacktrace: rework stack boundary discovery
  arm64: stacktrace: add stackinfo_on_stack() helper
  arm64: stacktrace: move SDEI stack helpers to stacktrace code
  arm64: stacktrace: rename unwind_next_common() -> unwind_next_frame_record()
  arm64: stacktrace: simplify unwind_next_common()
  arm64: stacktrace: fix kerneldoc comments

* for-next/atomics:
  : arm64 atomics improvements
  arm64: atomic: always inline the assembly
  arm64: atomics: remove LL/SC trampolines

* for-next/el1-exceptions:
  : Improve the reporting of EL1 exceptions
  arm64: rework BTI exception handling
  arm64: rework FPAC exception handling
  arm64: consistently pass ESR_ELx to die()
  arm64: die(): pass 'err' as long
  arm64: report EL1 UNDEFs better

* for-next/a510-erratum-2658417:
  : Cortex-A510: 2658417: remove BF16 support due to incorrect result
  arm64: errata: remove BF16 HWCAP due to incorrect result on Cortex-A510
  arm64: cpufeature: Expose get_arm64_ftr_reg() outside cpufeature.c
  arm64: cpufeature: Force HWCAP to be based on the sysreg visible to user-space

* for-next/defconfig:
  : arm64 defconfig updates
  arm64: defconfig: Add Coresight as module
  arm64: Enable docker support in defconfig
  arm64: defconfig: Enable memory hotplug and hotremove config
  arm64: configs: Enable all PMUs provided by Arm

* for-next/tpidr2_el0:
  : arm64 ptrace() support for TPIDR2_EL0
  kselftest/arm64: Add coverage of TPIDR2_EL0 ptrace interface
  arm64/ptrace: Support access to TPIDR2_EL0
  arm64/ptrace: Document extension of NT_ARM_TLS to cover TPIDR2_EL0
  kselftest/arm64: Add test coverage for NT_ARM_TLS

* for-next/ftrace:
  : arm64 ftraces updates/fixes
  arm64: ftrace: fix module PLTs with mcount
  arm64: module: Remove unused plt_entry_is_initialized()
  arm64: module: Make plt_equals_entry() static
2022-09-30 09:17:57 +01:00
Liao Chang a0caebbd04 arm64/kprobe: Optimize the performance of patching single-step slot
Single-step slot would not be used until kprobe is enabled, that means
no race condition occurs on it under SMP, hence it is safe to pacth ss
slot without stopping machine.

Since I and D caches are coherent within single-step slot from
aarch64_insn_patch_text_nosync(), hence no need to do it again via
flush_icache_range().

Acked-by: Will Deacon <will@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Link: https://lore.kernel.org/r/20220927022435.129965-4-liaochang1@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-30 09:17:15 +01:00
James Clark d56f66d2bd arm64: defconfig: Add Coresight as module
Add Coresight to defconfig so that build errors are caught.
CONFIG_CORESIGHT_SOURCE_ETM4X is excluded because it depends on
CONFIG_PID_IN_CONTEXTIDR which has a performance cost.

Signed-off-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20220922142400.478815-2-james.clark@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-29 18:14:20 +01:00
Mark Brown a711987490 kselftest/arm64: Handle EINTR while reading data from children
Currently we treat any error when reading from the child as a failure and
don't read any more output from that child as a result. This ignores the
fact that it is valid for read() to return EINTR as the error code if there
is a signal pending so we could stop handling the output of children,
especially during exit when we will get some SIGCHLD signals delivered to
us. Fix this by pulling the read handling out into a separate function
which returns a flag if reads should be continued and wrapping it in a
loop.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220921181345.618085-4-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-29 18:12:37 +01:00
Mark Brown dd72dd7cd5 kselftest/arm64: Flag fp-stress as exiting when we begin finishing up
Once we have started exiting the termination handler will have the same
effect as what we're already running so set the termination flag at that
point.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220921181345.618085-3-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-29 18:12:37 +01:00
Mark Brown c38d381fff kselftest/arm64: Don't repeat termination handler for fp-stress
When fp-stress gets a termination signal it sets a flag telling itself to
exit and sends a termination signal to all the children. If the flag is set
then don't bother repeating this process, it isn't going to accomplish
anything other than consume CPU time which can be an issue when running in
emulation.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220921181345.618085-2-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-29 18:12:37 +01:00
Xiu Jianfeng 8c6e3657be ARM64: reloc_test: add __init/__exit annotations to module init/exit funcs
Add missing __init/__exit annotations to module init/exit funcs.

Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Link: https://lore.kernel.org/r/20220911034747.132098-1-xiujianfeng@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-29 18:04:25 +01:00
Mike Rapoport b9dd04a20f arm64/mm: fold check for KFENCE into can_set_direct_map()
KFENCE requires linear map to be mapped at page granularity, so that it
is possible to protect/unprotect single pages, just like with
rodata_full and DEBUG_PAGEALLOC.

Instead of repating

	can_set_direct_map() || IS_ENABLED(CONFIG_KFENCE)

make can_set_direct_map() handle the KFENCE case.

This also prevents potential false positives in kernel_page_present()
that may return true for non-present page if CONFIG_KFENCE is enabled.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20220921074841.382615-1-rppt@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-29 17:59:28 +01:00
Mark Rutland 8cfb08575c arm64: ftrace: fix module PLTs with mcount
Li Huafei reports that mcount-based ftrace with module PLTs was broken
by commit:

  a625357997 ("arm64: ftrace: consistently handle PLTs.")

When a module PLTs are used and a module is loaded sufficiently far away
from the kernel, we'll create PLTs for any branches which are
out-of-range. These are separate from the special ftrace trampoline
PLTs, which the module PLT code doesn't directly manipulate.

When mcount is in use this is a problem, as each mcount callsite in a
module will be initialized to point to a module PLT, but since commit
a625357997 ftrace_make_nop() will assume that the callsite has
been initialized to point to the special ftrace trampoline PLT, and
ftrace_find_callable_addr() rejects other cases.

This means that when ftrace tries to initialize a callsite via
ftrace_make_nop(), the call to ftrace_find_callable_addr() will find
that the `_mcount` stub is out-of-range and is not handled by the ftrace
PLT, resulting in a splat:

| ftrace_test: loading out-of-tree module taints kernel.
| ftrace: no module PLT for _mcount
| ------------[ ftrace bug ]------------
| ftrace failed to modify
| [<ffff800029180014>] 0xffff800029180014
|  actual:   44:00:00:94
| Initializing ftrace call sites
| ftrace record flags: 2000000
|  (0)
|  expected tramp: ffff80000802eb3c
| ------------[ cut here ]------------
| WARNING: CPU: 3 PID: 157 at kernel/trace/ftrace.c:2120 ftrace_bug+0x94/0x270
| Modules linked in:
| CPU: 3 PID: 157 Comm: insmod Tainted: G           O       6.0.0-rc6-00151-gcd722513a189-dirty #22
| Hardware name: linux,dummy-virt (DT)
| pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : ftrace_bug+0x94/0x270
| lr : ftrace_bug+0x21c/0x270
| sp : ffff80000b2bbaf0
| x29: ffff80000b2bbaf0 x28: 0000000000000000 x27: ffff0000c4d38000
| x26: 0000000000000001 x25: ffff800009d7e000 x24: ffff0000c4d86e00
| x23: 0000000002000000 x22: ffff80000a62b000 x21: ffff8000098ebea8
| x20: ffff0000c4d38000 x19: ffff80000aa24158 x18: ffffffffffffffff
| x17: 0000000000000000 x16: 0a0d2d2d2d2d2d2d x15: ffff800009aa9118
| x14: 0000000000000000 x13: 6333626532303830 x12: 3030303866666666
| x11: 203a706d61727420 x10: 6465746365707865 x9 : 3362653230383030
| x8 : c0000000ffffefff x7 : 0000000000017fe8 x6 : 000000000000bff4
| x5 : 0000000000057fa8 x4 : 0000000000000000 x3 : 0000000000000001
| x2 : ad2cb14bb5438900 x1 : 0000000000000000 x0 : 0000000000000022
| Call trace:
|  ftrace_bug+0x94/0x270
|  ftrace_process_locs+0x308/0x430
|  ftrace_module_init+0x44/0x60
|  load_module+0x15b4/0x1ce8
|  __do_sys_init_module+0x1ec/0x238
|  __arm64_sys_init_module+0x24/0x30
|  invoke_syscall+0x54/0x118
|  el0_svc_common.constprop.4+0x84/0x100
|  do_el0_svc+0x3c/0xd0
|  el0_svc+0x1c/0x50
|  el0t_64_sync_handler+0x90/0xb8
|  el0t_64_sync+0x15c/0x160
| ---[ end trace 0000000000000000 ]---
| ---------test_init-----------

Fix this by reverting to the old behaviour of ignoring the old
instruction when initialising an mcount callsite in a module, which was
the behaviour prior to commit a625357997.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: a625357997 ("arm64: ftrace: consistently handle PLTs.")
Reported-by: Li Huafei <lihuafei1@huawei.com>
Link: https://lore.kernel.org/linux-arm-kernel/20220929094134.99512-1-lihuafei1@huawei.com
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220929134525.798593-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-29 17:47:54 +01:00
Li Huafei 5de2291605 arm64: module: Remove unused plt_entry_is_initialized()
Since commit f1a54ae9af ("arm64: module/ftrace: intialize PLT at load
time"), plt_entry_is_initialized() is unused anymore , so remove it.

Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220929094134.99512-3-lihuafei1@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-29 17:47:18 +01:00
Li Huafei 3fb420f56c arm64: module: Make plt_equals_entry() static
Since commit 4e69ecf4da ("arm64/module: ftrace: deal with place
relative nature of PLTs"), plt_equals_entry() is not used outside of
module-plts.c, so make it static.

Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220929094134.99512-2-lihuafei1@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-29 17:47:18 +01:00
Mark Rutland ba00c2a04f arm64: fix the build with binutils 2.27
Jon Hunter reports that for some toolchains the build has been broken
since commit:

  4c0bd995d7 ("arm64: alternatives: have callbacks take a cap")

... with a stream of build-time splats of the form:

|   CC      arch/arm64/kvm/hyp/vhe/debug-sr.o
| /tmp/ccY3kbki.s: Assembler messages:
| /tmp/ccY3kbki.s:1600: Error: found 'L', expected: ')'
| /tmp/ccY3kbki.s:1600: Error: found 'L', expected: ')'
| /tmp/ccY3kbki.s:1600: Error: found 'L', expected: ')'
| /tmp/ccY3kbki.s:1600: Error: found 'L', expected: ')'
| /tmp/ccY3kbki.s:1600: Error: junk at end of line, first unrecognized character
| is `L'
| /tmp/ccY3kbki.s:1723: Error: found 'L', expected: ')'
| /tmp/ccY3kbki.s:1723: Error: found 'L', expected: ')'
| /tmp/ccY3kbki.s:1723: Error: found 'L', expected: ')'
| /tmp/ccY3kbki.s:1723: Error: found 'L', expected: ')'
| /tmp/ccY3kbki.s:1723: Error: junk at end of line, first unrecognized character
| is `L'
| scripts/Makefile.build:249: recipe for target
| 'arch/arm64/kvm/hyp/vhe/debug-sr.o' failed

The issue here is that older versions of binutils (up to and including
2.27.0) don't like an 'L' suffix on constants. For plain assembly files,
UL() avoids this suffix, but in C files this gets added, and so for
inline assembly we can't directly use a constant defined with `UL()`.

We could avoid this by passing the constant as an input parameter, but
this isn't practical given the way we use the alternative macros.
Instead, just open code the constant without the `UL` suffix, and for
consistency do this for both the inline assembly macro and the regular
assembly macro.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 4c0bd995d7 ("arm64: alternatives: have callbacks take a cap")
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://lore.kernel.org/linux-arm-kernel/3cecc3a5-30b0-f0bd-c3de-9e09bd21909b@nvidia.com/
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220929150227.1028556-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-29 17:43:32 +01:00
Mark Brown 55c8a987dd kselftest/arm64: Don't enable v8.5 for MTE selftest builds
Currently we set -march=armv8.5+memtag when building the MTE selftests,
allowing the compiler to emit v8.5 and MTE instructions for anything it
generates. This means that we may get code that will generate SIGILLs when
run on older systems rather than skipping on non-MTE systems as should be
the case. Most toolchains don't select any incompatible instructions but
I have seen some reports which suggest that some may be appearing which do
so. This is also potentially problematic in that if the compiler chooses to
emit any MTE instructions for the C code it may interfere with the MTE
usage we are trying to test.

Since the only reason we are specifying this option is to allow us to
assemble MTE instructions in mte_helper.S we can avoid these issues by
moving to using a .arch directive there and adding the -march explicitly to
the toolchain support check instead of the generic CFLAGS.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220928154517.173108-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-29 17:34:40 +01:00
Mark Rutland 2305b809be arm64: uaccess: simplify uaccess_mask_ptr()
We introduced uaccess pointer masking for arm64 in commit:

  4d8efc2d5e ("arm64: Use pointer masking to limit uaccess speculation")

Which was intended to prevent speculative uaccesses to kernel memory on
CPUs where access permissions were not respected under speculation.

At the time, the uaccess primitives were occasionally used to access
kernel memory, with the maximum permitted address held in
thread_info::addr_limit. Consequently, the address masking needed to
take this dynamic limit into account.

Subsequently the uaccess primitives were reworked such that they are
only used for user memory, and as of commit:

  3d2403fd10 ("arm64: uaccess: remove set_fs()")

... the address limit was made a compile-time constant, but the logic
was otherwise unchanged.

Regardless of the configured VA size or whether TBI is in use, the
address space can be divided into three ranges:

* The TTBR0 VA range, for which any valid pointer has bit 55 *clear*,
  and any non-tag bits [63-56] must match bit 55 (i.e. must be clear).

* The TTBR1 VA range, for which any valid pointer has bit 55 *set*, and
  any non-tag bits [63-56] must match bit 55 (i.e. must be set).

* The gap between the TTBR0 and TTBR1 ranges, where bit 55 may be set or
  clear, but any access will result in a fault.

As the uaccess primitives are now only used for user memory in the TTBR0
VA range, we can prevent generation of TTBR1 addresses by clearing bit
55, which will either result in a TTBR0 address or a faulting address
between the TTBR VA ranges.

This is beneficial for code generation as:

* We no longer clobber the condition codes.

* We no longer burn a register on (TASK_SIZE_MAX - 1).

* We no longer need to consume the untagged pointer.

When building a defconfig v6.0-rc3 with GCC 12.1.0, this change makes
the resulting Image 64KiB smaller.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20220922151053.3520750-1-mark.rutland@arm.com
[catalin.marinas@arm.com: remove csdb() as the bit clearing is unconditional]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-23 14:39:20 +01:00
Will Deacon aa3e49b606 arm64: asm/perf_regs.h: Avoid C++-style comment in UAPI header
An arm64 'allmodconfig' build fails with GCC due to use of a C++-style
comment for the new SVE vector granule 'enum perf_event_arm_regs' entry:

  | /usr/include/asm/perf_regs.h:42:26: error: C++ style comments are not allowed in ISO C90

Use good ol' /* */ comment syntax to keep things rosey.

Link: https://lore.kernel.org/r/632cceb2.170a0220.599ec.0a3a@mx.google.com
Fixes: cbb0c02caf ("perf: arm64: Add SVE vector granule register to user regs")
Signed-off-by: Will Deacon <will@kernel.org>
2022-09-22 22:15:33 +01:00
Mark Brown 33060a6490 kselftest/arm64: Fix typo in hwcap check
We use a local variable hwcap to refer to the element of the hwcaps array
which we are currently checking. When checking for the relevant hwcap bit
being set in testing we were dereferencing hwcaps rather than hwcap in
fetching the AT_HWCAP to use, which is perfectly valid C but means we were
always checking the bit was set in the hwcap for whichever feature is first
in the array. Remove the stray s.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220907113400.12982-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-22 20:52:06 +01:00
Peter Collingbourne 973b9e3733 arm64: mte: move register initialization to C
If FEAT_MTE2 is disabled via the arm64.nomte command line argument on a
CPU that claims to support FEAT_MTE2, the kernel will use Tagged Normal
in the MAIR. If we interpret arm64.nomte to mean that the CPU does not
in fact implement FEAT_MTE2, setting the system register like this may
lead to UNSPECIFIED behavior. Fix it by arranging for MAIR to be set
in the C function cpu_enable_mte which is called based on the sanitized
version of the system register.

There is no need for the rest of the MTE-related system register
initialization to happen from assembly, with the exception of TCR_EL1,
which must be set to include at least TBI1 because the secondary CPUs
access KASan-allocated data structures early. Therefore, make the TCR_EL1
initialization unconditional and move the rest of the initialization to
cpu_enable_mte so that we no longer have a dependency on the unsanitized
ID register value.

Co-developed-by: Evgenii Stepanov <eugenis@google.com>
Signed-off-by: Peter Collingbourne <pcc@google.com>
Signed-off-by: Evgenii Stepanov <eugenis@google.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 3b714d24ef ("arm64: mte: CPU feature detection and initial sysreg configuration")
Cc: <stable@vger.kernel.org> # 5.10.x
Link: https://lore.kernel.org/r/20220915222053.3484231-1-eugenis@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-22 18:02:50 +01:00
Kefeng Wang 739e49e0fc arm64: mm: handle ARM64_KERNEL_USES_PMD_MAPS in vmemmap_populate()
Directly check ARM64_SWAPPER_USES_SECTION_MAPS to choose base page
or PMD level huge page mapping in vmemmap_populate() to simplify
code a bit.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/20220920014951.196191-1-wangkefeng.wang@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-22 16:25:51 +01:00
Will Deacon c44094eee3 arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()
arch_dma_prep_coherent() is called when preparing a non-cacheable region
for a consistent DMA buffer allocation. Since the buffer pages may
previously have been written via a cacheable mapping and consequently
allocated as dirty cachelines, the purpose of this function is to remove
these dirty lines from the cache, writing them back so that the
non-coherent device is able to see them.

On arm64, this operation can be achieved with a clean to the point of
coherency; a subsequent invalidation is not required and serves little
purpose in the presence of a cacheable alias (e.g. the linear map),
since clean lines can be speculatively fetched back into the cache after
the invalidation operation has completed.

Relax the cache maintenance in arch_dma_prep_coherent() so that only a
clean, and not a clean-and-invalidate operation is performed.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220823122111.17439-1-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-22 15:09:31 +01:00
James Clark 1f2906d1e1 arm64/sve: Add Perf extensions documentation
Document that the VG register is available in Perf samples

Signed-off-by: James Clark <james.clark@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220901132658.1024635-3-james.clark@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-09-22 15:06:02 +01:00
James Clark cbb0c02caf perf: arm64: Add SVE vector granule register to user regs
Dwarf based unwinding in a function that pushes SVE registers onto
the stack requires the unwinder to know the length of the SVE register
to calculate the stack offsets correctly. This was added to the Arm
specific Dwarf spec as the VG pseudo register[1].

Add the vector length at position 46 if it's requested by userspace and
SVE is supported. If it's not supported then fail to open the event.

The vector length must be on each sample because it can be changed
at runtime via a prctl or ptrace call. Also by adding it as a register
rather than a separate attribute, minimal changes will be required in an
unwinder that already indexes into the register list.

[1]: https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20220901132658.1024635-2-james.clark@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-09-22 15:06:02 +01:00
Shuai Xue d813a19e7d MAINTAINERS: add maintainers for Alibaba' T-Head PMU driver
Add maintainers for Alibaba PMU document and driver

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220818031822.38415-4-xueshuai@linux.alibaba.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-09-22 14:09:10 +01:00
Shuai Xue cf7b61073e drivers/perf: add DDR Sub-System Driveway PMU driver for Yitian 710 SoC
Add the DDR Sub-System Driveway Performance Monitoring Unit (PMU) driver
support for Alibaba T-Head Yitian 710 SoC chip. Yitian supports DDR5/4
DRAM and targets cloud computing and HPC.

Each PMU is registered as a device in /sys/bus/event_source/devices, and
users can select event to monitor in each sub-channel, independently. For
example, ali_drw_21000 and ali_drw_21080 are two PMU devices for two
sub-channels of the same channel in die 0. And the PMU device of die 1 is
prefixed with ali_drw_400XXXXX, e.g. ali_drw_40021000.

Due to hardware limitation, one of DDRSS Driveway PMU overflow interrupt
shares the same irq number with MPAM ERR_IRQ. To register DDRSS PMU and
MPAM drivers successfully, add IRQF_SHARED flag.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Co-developed-by: Hongbo Yao <yaohongbo@linux.alibaba.com>
Signed-off-by: Hongbo Yao <yaohongbo@linux.alibaba.com>
Co-developed-by: Neng Chen <nengchen@linux.alibaba.com>
Signed-off-by: Neng Chen <nengchen@linux.alibaba.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220818031822.38415-3-xueshuai@linux.alibaba.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-09-22 14:09:10 +01:00
Shuai Xue a6f92909d6 docs: perf: Add description for Alibaba's T-Head PMU driver
Alibaba's T-Head SoC implements uncore PMU for performance and functional
debugging to facilitate system maintenance. Document it to provide guidance
on how to use it.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220914022326.88550-2-xueshuai@linux.alibaba.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-09-22 14:09:10 +01:00
Mark Brown 2ddadec220 kselftest/arm64: Add coverage of TPIDR2_EL0 ptrace interface
Extend the ptrace test support for NT_ARM_TLS to cover TPIDR2_EL0 - on
systems that support SME the NT_ARM_TLS regset can be up to 2 elements
long with the second element containing TPIDR2_EL0. On systems
supporting SME we verify that this value can be read and written while
on systems that do not support SME we verify correct truncation of reads
and writes.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829154921.837871-5-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-21 17:26:59 +01:00
Mark Brown 0027d9c6c7 arm64/ptrace: Support access to TPIDR2_EL0
SME introduces an additional EL0 register, TPIDR2_EL0, intended for use
by userspace as part of the SME. Provide ptrace access to it through the
existing NT_ARM_TLS regset used for TPIDR_EL0 by expanding it to two
registers with TPIDR2_EL0 being the second one.

Existing programs that query the size of the register set will be able
to observe the increased size of the register set. Programs that assume
the register set is single register will see no change. On systems that
do not support SME TPIDR2_EL0 will read as 0 and writes will be ignored,
support for SME should be queried via hwcaps as normal.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829154921.837871-4-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-21 17:26:58 +01:00
Mark Brown f285da05c6 arm64/ptrace: Document extension of NT_ARM_TLS to cover TPIDR2_EL0
In order to allow debuggers to discover lazily saved SME state we need
to provide access to TPIDR2_EL0, we will extend the existing NT_ARM_TLS
used for TPIDR to also include TPIDR2_EL0 as the second register in the
regset.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829154921.837871-3-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-21 17:26:58 +01:00
Mark Brown ecaf4d3f73 kselftest/arm64: Add test coverage for NT_ARM_TLS
In preparation for extending support for NT_ARM_TLS to cover additional
TPIDRs add some tests for the existing interface. Do this in a generic
ptrace test program to provide a place to collect additional tests in
the future.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829154921.837871-2-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-21 17:26:58 +01:00
Vincenzo Frascino e2c1254042 arm64: Enable docker support in defconfig
The arm64 defconfig does not support the docker usecase.

Enable the missing configuration options.

The resulting .config was validated with [1].

...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- apparmor: enabled and tools installed
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled (as module)
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_NETFILTER_XT_MARK: enabled (as module)
- CONFIG_IP_NF_NAT: enabled (as module)
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_POSIX_MQUEUE: enabled
- CONFIG_CGROUP_BPF: enabled

...

[1] https://github.com/moby/moby/blob/master/contrib/check-config.sh

Cc: Will Deacon <will@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lore.kernel.org/r/20220907110235.14708-1-vincenzo.frascino@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-21 13:12:00 +01:00
Kefeng Wang 31dbadcc28 arm64: defconfig: Enable memory hotplug and hotremove config
Let's enable ACPI_HMAT, ACPI_HOTPLUG_MEMORY, MEMORY_HOTPLUG
and MEMORY_HOTREMOVE for more test coverage, also there are
useful for heterogeneous memory scene.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220629093524.34801-1-wangkefeng.wang@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-21 13:10:40 +01:00
Mark Brown 120072f48f arm64: configs: Enable all PMUs provided by Arm
The selection of PMUs enabled in the defconfig is currently a bit random
and does not include a number of those provided by Arm and present in a
fairly wide range of SoCs. Improve coverage and defconfig utility by
enabling all the Arm provided PMUs by default.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: James Clark <james.clark@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20220919162753.3079869-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-21 13:07:53 +01:00
Mark Rutland 0072dc1b53 arm64: avoid BUILD_BUG_ON() in alternative-macros
Nathan reports that the build fails when using clang and LTO:

|  In file included from kernel/bounds.c:10:
|  In file included from ./include/linux/page-flags.h:10:
|  In file included from ./include/linux/bug.h:5:
|  In file included from ./arch/arm64/include/asm/bug.h:26:
|  In file included from ./include/asm-generic/bug.h:5:
|  In file included from ./include/linux/compiler.h:248:
|  In file included from ./arch/arm64/include/asm/rwonce.h:11:
|  ./arch/arm64/include/asm/alternative-macros.h:224:2: error: call to undeclared function 'BUILD_BUG_ON'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
|          BUILD_BUG_ON(feature >= ARM64_NCAPS);
|          ^
|  ./arch/arm64/include/asm/alternative-macros.h:241:2: error: call to undeclared function 'BUILD_BUG_ON'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
|          BUILD_BUG_ON(feature >= ARM64_NCAPS);
|          ^
|  2 errors generated.

... the problem being that when LTO is enabled, <asm/rwonce.h> includes
<asm/alternative-macros.h>, and causes a circular include dependency
through <linux/bug.h>. This manifests as BUILD_BUG_ON() not being
defined when used within <asm/alternative-macros.h>.

This patch avoids the problem and simplifies the include dependencies by
using compiletime_assert() instead of BUILD_BUG_ON().

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 21fb26bfb0 ("arm64: alternatives: add alternative_has_feature_*()")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: http://lore.kernel.org/r/YyigTrxhE3IRPzjs@dev-arch.thelio-3990X
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220920140044.1709073-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-21 12:41:50 +01:00
Nathan Chancellor db74cd6337 arm64/sysreg: Fix a few missed conversions
After the conversion to automatically generating the ID_AA64DFR0_EL1
definition names, the build fails in a few different places because some
of the definitions were not changed to their new names along the way.
Update the names to resolve the build errors.

Fixes: c0357a73fa ("arm64/sysreg: Align field names in ID_AA64DFR0_EL1 with architecture")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220919160928.3905780-1-nathan@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-21 09:24:29 +01:00
Mark Rutland d926079f17 arm64: alternatives: add shared NOP callback
For each instance of an alternative, the compiler outputs a distinct
copy of the alternative instructions into a subsection. As the compiler
doesn't have special knowledge of alternatives, it cannot coalesce these
to save space.

In a defconfig kernel built with GCC 12.1.0, there are approximately
10,000 instances of alternative_has_feature_likely(), where the
replacement instruction is always a NOP. As NOPs are
position-independent, we don't need a unique copy per alternative
sequence.

This patch adds a callback to patch an alternative sequence with NOPs,
and make use of this in alternative_has_feature_likely(). So that this
can be used for other sites in future, this is written to patch multiple
instructions up to the original sequence length.

For NVHE, an alias is added to image-vars.h.

For modules, the callback is exported. Note that as modules are loaded
within 2GiB of the kernel, an alt_instr entry in a module can always
refer directly to the callback, and no special handling is necessary.

When building with GCC 12.1.0, the vmlinux is ~158KiB smaller, though
the resulting Image size is unchanged due to alignment constraints and
padding:

| % ls -al vmlinux-*
| -rwxr-xr-x 1 mark mark 134644592 Sep  1 14:52 vmlinux-after
| -rwxr-xr-x 1 mark mark 134486232 Sep  1 14:50 vmlinux-before
| % ls -al Image-*
| -rw-r--r-- 1 mark mark 37108224 Sep  1 14:52 Image-after
| -rw-r--r-- 1 mark mark 37108224 Sep  1 14:50 Image-before

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220912162210.3626215-9-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-16 17:15:03 +01:00
Mark Rutland 21fb26bfb0 arm64: alternatives: add alternative_has_feature_*()
Currrently we use a mixture of alternative sequences and static branches
to handle features detected at boot time. For ease of maintenance we
generally prefer to use static branches in C code, but this has a few
downsides:

* Each static branch has metadata in the __jump_table section, which is
  not discarded after features are finalized. This wastes some space,
  and slows down the patching of other static branches.

* The static branches are patched at a different point in time from the
  alternatives, so changes are not atomic. This leaves a transient
  period where there could be a mismatch between the behaviour of
  alternatives and static branches, which could be problematic for some
  features (e.g. pseudo-NMI).

* More (instrumentable) kernel code is executed to patch each static
  branch, which can be risky when patching certain features (e.g.
  irqflags management for pseudo-NMI).

* When CONFIG_JUMP_LABEL=n, static branches are turned into a load of a
  flag and a conditional branch. This means it isn't safe to use such
  static branches in an alternative address space (e.g. the NVHE/PKVM
  hyp code), where the generated address isn't safe to acccess.

To deal with these issues, this patch introduces new
alternative_has_feature_*() helpers, which work like static branches but
are patched using alternatives. This ensures the patching is performed
at the same time as other alternative patching, allows the metadata to
be freed after patching, and is safe for use in alternative address
spaces.

Note that all supported toolchains have asm goto support, and since
commit:

  a0a12c3ed0 ("asm goto: eradicate CC_HAS_ASM_GOTO)"

... the CC_HAS_ASM_GOTO Kconfig symbol has been removed, so no feature
check is necessary, and we can always make use of asm goto.

Additionally, note that:

* This has no impact on cpus_have_cap(), which is a dynamic check.

* This has no functional impact on cpus_have_const_cap(). The branches
  are patched slightly later than before this patch, but these branches
  are not reachable until caps have been finalised.

* It is now invalid to use cpus_have_final_cap() in the window between
  feature detection and patching. All existing uses are only expected
  after patching anyway, so this should not be a problem.

* The LSE atomics will now be enabled during alternatives patching
  rather than immediately before. As the LL/SC an LSE atomics are
  functionally equivalent this should not be problematic.

When building defconfig with GCC 12.1.0, the resulting Image is 64KiB
smaller:

| % ls -al Image-*
| -rw-r--r-- 1 mark mark 37108224 Aug 23 09:56 Image-after
| -rw-r--r-- 1 mark mark 37173760 Aug 23 09:54 Image-before

According to bloat-o-meter.pl:

| add/remove: 44/34 grow/shrink: 602/1294 up/down: 39692/-61108 (-21416)
| Function                                     old     new   delta
| [...]
| Total: Before=16618336, After=16596920, chg -0.13%
| add/remove: 0/2 grow/shrink: 0/0 up/down: 0/-1296 (-1296)
| Data                                         old     new   delta
| arm64_const_caps_ready                        16       -     -16
| cpu_hwcap_keys                              1280       -   -1280
| Total: Before=8987120, After=8985824, chg -0.01%
| add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0)
| RO Data                                      old     new   delta
| Total: Before=18408, After=18408, chg +0.00%

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220912162210.3626215-8-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-16 17:15:03 +01:00
Mark Rutland 4c0bd995d7 arm64: alternatives: have callbacks take a cap
Today, callback alternatives are special-cased within
__apply_alternatives(), and are applied alongside patching for system
capabilities as ARM64_NCAPS is not part of the boot_capabilities feature
mask.

This special-casing is less than ideal. Giving special meaning to
ARM64_NCAPS for this requires some structures and loops to use
ARM64_NCAPS + 1 (AKA ARM64_NPATCHABLE), while others use ARM64_NCAPS.
It's also not immediately clear callback alternatives are only applied
when applying alternatives for system-wide features.

To make this a bit clearer, changes the way that callback alternatives
are identified to remove the special-casing of ARM64_NCAPS, and to allow
callback alternatives to be associated with a cpucap as with all other
alternatives.

New cpucaps, ARM64_ALWAYS_BOOT and ARM64_ALWAYS_SYSTEM are added which
are always detected alongside boot cpu capabilities and system
capabilities respectively. All existing callback alternatives are made
to use ARM64_ALWAYS_SYSTEM, and so will be patched at the same point
during the boot flow as before.

Subsequent patches will make more use of these new cpucaps.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220912162210.3626215-7-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-16 17:15:03 +01:00
Mark Rutland b723edf3a1 arm64: alternatives: make alt_region const
We never alter a struct alt_region after creation, and we open-code the
bounds of the kernel alternatives region in two functions. The
duplication is a bit unfortunate for clarity (and in future we're likely
to have more functions altering alternative regions), and to avoid
accidents it would be good to make the structure const.

This patch adds a shared struct `kernel_alternatives` alt_region for the
main kernel image, and marks the alt_regions as const to prevent
unintentional modification.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220912162210.3626215-6-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-16 17:15:03 +01:00
Mark Rutland c5ba03260c arm64: alternatives: hoist print out of __apply_alternatives()
Printing in the middle of __apply_alternatives() is potentially unsafe
and not all that helpful given these days we practically always patch
*something*.

Hoist the print out of __apply_alternatives(), and add separate prints
to __apply_alternatives() and apply_alternatives_all(), which will make
it easier to spot if either patching call goes wrong.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220912162210.3626215-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-16 17:15:03 +01:00
Mark Rutland 747ad8d557 arm64: alternatives: proton-pack: prepare for cap changes
The spectre patching callbacks use cpus_have_final_cap(), and subsequent
patches will make it invalid to call cpus_have_final_cap() before
alternatives patching has completed.

In preparation for said change, this patch modifies the spectre patching
callbacks use cpus_have_cap(). This is not subject to patching, and will
dynamically check the cpu_hwcaps array, which is functionally equivalent
to the existing behaviour.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220912162210.3626215-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-16 17:15:03 +01:00
Mark Rutland 34bbfdfb14 arm64: alternatives: kvm: prepare for cap changes
The KVM patching callbacks use cpus_have_final_cap() internally within
has_vhe(), and subsequent patches will make it invalid to call
cpus_have_final_cap() before alternatives patching has completed, and
will mean that cpus_have_const_cap() will always fall back to dynamic
checks prior to alternatives patching.

In preparation for said change, this patch modifies the KVM patching
callbacks to use cpus_have_cap() directly. This is not subject to
patching, and will dynamically check the cpu_hwcaps array, which is
functionally equivalent to the existing behaviour.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220912162210.3626215-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-16 17:15:02 +01:00
Mark Rutland 92b4b5619f arm64: cpufeature: make cpus_have_cap() noinstr-safe
Currently it isn't safe to use cpus_have_cap() from noinstr code as
test_bit() is explicitly instrumented, and were cpus_have_cap() placed
out-of-line, cpus_have_cap() itself could be instrumented.

Make cpus_have_cap() noinstr safe by marking it __always_inline and
using arch_test_bit().

Aside from the prevention of instrumentation, there should be no
functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220912162210.3626215-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-16 17:15:02 +01:00