OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Xu Kuohai	042152c27c	bpf, arm64: Sign return address for JITed code Sign return address for JITed code when the kernel is built with pointer authentication enabled: 1. Sign LR with paciasp instruction before LR is pushed to stack. Since paciasp acts like landing pads for function entry, no need to insert bti instruction before paciasp. 2. Authenticate LR with autiasp instruction after LR is popped from stack. For BPF tail call, the stack frame constructed by the caller is reused by the callee. That is, the stack frame is constructed by the caller and destructed by the callee. Thus LR is signed and pushed to the stack in the caller's prologue, and poped from the stack and authenticated in the callee's epilogue. For BPF2BPF call, the caller and callee construct their own stack frames, and sign and authenticate their own LRs. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://events.static.linuxfound.org/sites/events/files/slides/slides_23.pdf Link: https://lore.kernel.org/bpf/20220402073942.3782529-1-xukuohai@huawei.com	2022-04-06 00:04:22 +02:00
Xu Kuohai	7db6c0f1d8	bpf, arm64: Optimize BPF store/load using arm64 str/ldr(immediate offset) The current BPF store/load instruction is translated by the JIT into two instructions. The first instruction moves the immediate offset into a temporary register. The second instruction uses this temporary register to do the real store/load. In fact, arm64 supports addressing with immediate offsets. So This patch introduces optimization that uses arm64 str/ldr instruction with immediate offset when the offset fits. Example of generated instuction for r2 = (u64 )(r1 + 0): without optimization: mov x10, 0 ldr x1, [x0, x10] with optimization: ldr x1, [x0, 0] If the offset is negative, or is not aligned correctly, or exceeds max value, rollback to the use of temporary register. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220321152852.2334294-3-xukuohai@huawei.com	2022-04-01 00:27:34 +02:00
Hou Tao	1902472b4f	bpf, arm64: Support more atomic operations Atomics for eBPF patch series adds support for atomic[64]_fetch_add, atomic[64]_[fetch_]{and,or,xor} and atomic[64]_{xchg\|cmpxchg}, but it only adds support for x86-64, so support these atomic operations for arm64 as well. Basically the implementation procedure is almost mechanical translation of code snippets in atomic_ll_sc.h & atomic_lse.h & cmpxchg.h located under arch/arm64/include/asm. When LSE atomic is unavailable, an extra temporary register is needed for (BPF_ADD \| BPF_FETCH) to save the value of src register, instead of adding TMP_REG_4 just use BPF_REG_AX instead. Also make emit_lse_atomic() as an empty inline function when CONFIG_ARM64_LSE_ATOMICS is disabled. For cpus_have_cap(ARM64_HAS_LSE_ATOMICS) case and no-LSE-ATOMICS case, the following three tests: "./test_verifier", "./test_progs -t atomic" and "insmod ./test_bpf.ko" are exercised and passed. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220217072232.1186625-4-houtao1@huawei.com	2022-02-28 16:27:22 +01:00
Hou Tao	fa1114d9eb	arm64: insn: add encoders for atomic operations It is a preparation patch for eBPF atomic supports under arm64. eBPF needs support atomic[64]_fetch_add, atomic[64]_[fetch_]{and,or,xor} and atomic[64]_{xchg\|cmpxchg}. The ordering semantics of eBPF atomics are the same with the implementations in linux kernel. Add three helpers to support LDCLR/LDEOR/LDSET/SWP, CAS and DMB instructions. STADD/STCLR/STEOR/STSET are simply encoded as aliases for LDADD/LDCLR/LDEOR/LDSET with XZR as the destination register, so no extra helper is added. atomic_fetch_add() and other atomic ops needs support for STLXR instruction, so extend enum aarch64_insn_ldst_type to do that. LDADD/LDEOR/LDSET/SWP and CAS instructions are only available when LSE atomics is enabled, so just return AARCH64_BREAK_FAULT directly in these newly-added helpers if CONFIG_ARM64_LSE_ATOMICS is disabled. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20220217072232.1186625-3-houtao1@huawei.com Signed-off-by: Will Deacon <will@kernel.org>	2022-02-22 21:25:48 +00:00
Will Deacon	d27865279f	Merge branch 'for-next/bti' into for-next/core Support for Branch Target Identification (BTI) in user and kernel (Mark Brown and others) * for-next/bti: (39 commits) arm64: vdso: Fix CFI directives in sigreturn trampoline arm64: vdso: Don't prefix sigreturn trampoline with a BTI C instruction arm64: bti: Fix support for userspace only BTI arm64: kconfig: Update and comment GCC version check for kernel BTI arm64: vdso: Map the vDSO text with guarded pages when built for BTI arm64: vdso: Force the vDSO to be linked as BTI when built for BTI arm64: vdso: Annotate for BTI arm64: asm: Provide a mechanism for generating ELF note for BTI arm64: bti: Provide Kconfig for kernel mode BTI arm64: mm: Mark executable text as guarded pages arm64: bpf: Annotate JITed code for BTI arm64: Set GP bit in kernel page tables to enable BTI for the kernel arm64: asm: Override SYM_FUNC_START when building the kernel with BTI arm64: bti: Support building kernel C code using BTI arm64: Document why we enable PAC support for leaf functions arm64: insn: Report PAC and BTI instructions as skippable arm64: insn: Don't assume unrecognized HINTs are skippable arm64: insn: Provide a better name for aarch64_insn_is_nop() arm64: insn: Add constants for new HINT instruction decode arm64: Disable old style assembly annotations ...	2020-05-28 18:00:51 +01:00
Luke Nelson	fd868f1481	bpf, arm64: Optimize ADD,SUB,JMP BPF_K using arm64 add/sub immediates The current code for BPF_{ADD,SUB} BPF_K loads the BPF immediate to a temporary register before performing the addition/subtraction. Similarly, BPF_JMP BPF_K cases load the immediate to a temporary register before comparison. This patch introduces optimizations that use arm64 immediate add, sub, cmn, or cmp instructions when the BPF immediate fits. If the immediate does not fit, it falls back to using a temporary register. Example of generated code for BPF_ALU64_IMM(BPF_ADD, R0, 2): without optimization: 24: mov x10, #0x2 28: add x7, x7, x10 with optimization: 24: add x7, x7, #0x2 The code could use A64_{ADD,SUB}_I directly and check if it returns AARCH64_BREAK_FAULT, similar to how logical immediates are handled. However, aarch64_insn_gen_add_sub_imm from insn.c prints error messages when the immediate does not fit, and it's simpler to check if the immediate fits ahead of time. Co-developed-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20200508181547.24783-4-luke.r.nels@gmail.com Signed-off-by: Will Deacon <will@kernel.org>	2020-05-11 12:21:39 +01:00
Luke Nelson	fd49591cb4	bpf, arm64: Optimize AND,OR,XOR,JSET BPF_K using arm64 logical immediates The current code for BPF_{AND,OR,XOR,JSET} BPF_K loads the immediate to a temporary register before use. This patch changes the code to avoid using a temporary register when the BPF immediate is encodable using an arm64 logical immediate instruction. If the encoding fails (due to the immediate not being encodable), it falls back to using a temporary register. Example of generated code for BPF_ALU32_IMM(BPF_AND, R0, 0x80000001): without optimization: 24: mov w10, #0x8000ffff 28: movk w10, #0x1 2c: and w7, w7, w10 with optimization: 24: and w7, w7, #0x80000001 Since the encoding process is quite complex, the JIT reuses existing functionality in arch/arm64/kernel/insn.c for encoding logical immediates rather than duplicate it in the JIT. Co-developed-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20200508181547.24783-3-luke.r.nels@gmail.com Signed-off-by: Will Deacon <will@kernel.org>	2020-05-11 12:21:39 +01:00
Mark Brown	fa76cfe65c	arm64: bpf: Annotate JITed code for BTI In order to extend the protection offered by BTI to all code executing in kernel mode we need to annotate JITed BPF code appropriately for BTI. To do this we need to add a landing pad to the start of each BPF function and also immediately after the function prologue if we are emitting a function which can be tail called. Jumps within BPF functions are all to immediate offsets and therefore do not require landing pads. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20200506195138.22086-6-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>	2020-05-07 17:53:20 +01:00
Jerin Jacob	504792e07a	arm64: bpf: optimize modulo operation Optimize modulo operation instruction generation by using single MSUB instruction vs MUL followed by SUB instruction scheme. Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-09-03 15:44:40 +02:00
Thomas Gleixner	caab277b1d	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 503 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Enrico Weigelt <info@metux.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-06-19 17:09:07 +02:00
Daniel Borkmann	34b8ab091f	bpf, arm64: use more scalable stadd over ldxr / stxr loop in xadd Since ARMv8.1 supplement introduced LSE atomic instructions back in 2016, lets add support for STADD and use that in favor of LDXR / STXR loop for the XADD mapping if available. STADD is encoded as an alias for LDADD with XZR as the destination register, therefore add LDADD to the instruction encoder along with STADD as special case and use it in the JIT for CPUs that advertise LSE atomics in CPUID register. If immediate offset in the BPF XADD insn is 0, then use dst register directly instead of temporary one. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-26 18:53:40 -07:00
Daniel Borkmann	8968c67a82	bpf, arm64: remove prefetch insn in xadd mapping Prefetch-with-intent-to-write is currently part of the XADD mapping in the AArch64 JIT and follows the kernel's implementation of atomic_add. This may interfere with other threads executing the LDXR/STXR loop, leading to potential starvation and fairness issues. Drop the optional prefetch instruction. Fixes: `85f68fe898` ("bpf, arm64: implement jiting of BPF_XADD") Reported-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-26 18:53:15 -07:00
Daniel Borkmann	c362b2f34e	bpf, arm64: implement jiting of BPF_J{LT, LE, SLT, SLE} This work implements jiting of BPF_J{LT,LE,SLT,SLE} instructions with BPF_X/BPF_K variants for the arm64 eBPF JIT. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-09 16:53:56 -07:00
Daniel Borkmann	85f68fe898	bpf, arm64: implement jiting of BPF_XADD This work adds BPF_XADD for BPF_W/BPF_DW to the arm64 JIT and therefore completes JITing of all BPF instructions, meaning we can thus also remove the 'notyet' label and do not need to fall back to the interpreter when BPF_XADD is used in a program! This now also brings arm64 JIT in line with x86_64, s390x, ppc64, sparc64, where all current eBPF features are supported. BPF_W example from test_bpf: .u.insns_int = { BPF_ALU32_IMM(BPF_MOV, R0, 0x12), BPF_ST_MEM(BPF_W, R10, -40, 0x10), BPF_STX_XADD(BPF_W, R10, R0, -40), BPF_LDX_MEM(BPF_W, R0, R10, -40), BPF_EXIT_INSN(), }, [...] 00000020: 52800247 mov w7, #0x12 // #18 00000024: 928004eb mov x11, #0xffffffffffffffd8 // #-40 00000028: d280020a mov x10, #0x10 // #16 0000002c: b82b6b2a str w10, [x25,x11] // start of xadd mapping: 00000030: 928004ea mov x10, #0xffffffffffffffd8 // #-40 00000034: 8b19014a add x10, x10, x25 00000038: f9800151 prfm pstl1strm, [x10] 0000003c: 885f7d4b ldxr w11, [x10] 00000040: 0b07016b add w11, w11, w7 00000044: 880b7d4b stxr w11, w11, [x10] 00000048: 35ffffab cbnz w11, 0x0000003c // end of xadd mapping: [...] BPF_DW example from test_bpf: .u.insns_int = { BPF_ALU32_IMM(BPF_MOV, R0, 0x12), BPF_ST_MEM(BPF_DW, R10, -40, 0x10), BPF_STX_XADD(BPF_DW, R10, R0, -40), BPF_LDX_MEM(BPF_DW, R0, R10, -40), BPF_EXIT_INSN(), }, [...] 00000020: 52800247 mov w7, #0x12 // #18 00000024: 928004eb mov x11, #0xffffffffffffffd8 // #-40 00000028: d280020a mov x10, #0x10 // #16 0000002c: f82b6b2a str x10, [x25,x11] // start of xadd mapping: 00000030: 928004ea mov x10, #0xffffffffffffffd8 // #-40 00000034: 8b19014a add x10, x10, x25 00000038: f9800151 prfm pstl1strm, [x10] 0000003c: c85f7d4b ldxr x11, [x10] 00000040: 8b07016b add x11, x11, x7 00000044: c80b7d4b stxr w11, x11, [x10] 00000048: 35ffffab cbnz w11, 0x0000003c // end of xadd mapping: [...] Tested on Cavium ThunderX ARMv8, test suite results after the patch: No JIT: [ 3751.855362] test_bpf: Summary: 311 PASSED, 0 FAILED, [0/303 JIT'ed] With JIT: [ 3573.759527] test_bpf: Summary: 311 PASSED, 0 FAILED, [303/303 JIT'ed] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-02 15:04:50 -04:00
Zi Shen Lim	ddb55992b0	arm64: bpf: implement bpf_tail_call() helper Add support for JMP_CALL_X (tail call) introduced by commit `04fd61ab36` ("bpf: allow bpf programs to tail-call other bpf programs"). bpf_tail_call() arguments: ctx - context pointer passed to next program array - pointer to map which type is BPF_MAP_TYPE_PROG_ARRAY index - index inside array that selects specific program to run In this implementation arm64 JIT jumps into callee program after prologue, so callee program reuses the same stack. For tail_call_cnt, we use the callee-saved R26 (which was already saved/restored but previously unused by JIT). With this patch a tail call generates the following code on arm64: if (index >= array->map.max_entries) goto out; 34: mov x10, #0x10 // #16 38: ldr w10, [x1,x10] 3c: cmp w2, w10 40: b.ge 0x0000000000000074 if (tail_call_cnt > MAX_TAIL_CALL_CNT) goto out; tail_call_cnt++; 44: mov x10, #0x20 // #32 48: cmp x26, x10 4c: b.gt 0x0000000000000074 50: add x26, x26, #0x1 prog = array->ptrs[index]; if (prog == NULL) goto out; 54: mov x10, #0x68 // #104 58: ldr x10, [x1,x10] 5c: ldr x11, [x10,x2] 60: cbz x11, 0x0000000000000074 goto *(prog->bpf_func + prologue_size); 64: mov x10, #0x20 // #32 68: ldr x10, [x11,x10] 6c: add x10, x10, #0x20 70: br x10 74: Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-10 23:11:49 -07:00
Zi Shen Lim	251599e1d6	arm64: bpf: fix div-by-zero case In the case of division by zero in a BPF program: A = A / X; (X == 0) the expected behavior is to terminate with return value 0. This is confirmed by the test case introduced in commit `86bf1721b2` ("test_bpf: add tests checking that JIT/interpreter sets A and X to 0."). Reported-by: Yang Shi <yang.shi@linaro.org> Tested-by: Yang Shi <yang.shi@linaro.org> CC: Xi Wang <xi.wang@gmail.com> CC: Alexei Starovoitov <ast@plumgrid.com> CC: linux-arm-kernel@lists.infradead.org CC: linux-kernel@vger.kernel.org Fixes: `e54bcde3d6` ("arm64: eBPF JIT compiler") Cc: <stable@vger.kernel.org> # 3.18+ Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-11-06 16:58:36 +00:00
Xi Wang	d63903bbc3	arm64: bpf: fix endianness conversion bugs Upper bits should be zeroed in endianness conversion: - even when there's no need to change endianness (i.e., BPF_FROM_BE on big endian or BPF_FROM_LE on little endian); - after rev16. This patch fixes such bugs by emitting extra instructions to clear upper bits. Cc: Zi Shen Lim <zlim.lnx@gmail.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Fixes: `e54bcde3d6` ("arm64: eBPF JIT compiler") Cc: <stable@vger.kernel.org> # 3.18+ Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-06-26 14:15:39 +01:00
Zi Shen Lim	d65a634a0a	arm64: bpf: add 'shift by register' instructions Commit `72b603ee8c` ("bpf: x86: add missing 'shift by register' instructions to x64 eBPF JIT") noted support for 'shift by register' in eBPF and added support for it for x64. Let's enable this for arm64 as well. The arm64 eBPF JIT compiler now passes the new 'shift by register' test case introduced in the same commit `72b603ee8c`. Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2014-10-20 17:47:03 +01:00
Zi Shen Lim	e54bcde3d6	arm64: eBPF JIT compiler The JIT compiler emits A64 instructions. It supports eBPF only. Legacy BPF is supported thanks to conversion by BPF core. JIT is enabled in the same way as for other architectures: echo 1 > /proc/sys/net/core/bpf_jit_enable Or for additional compiler output: echo 2 > /proc/sys/net/core/bpf_jit_enable See Documentation/networking/filter.txt for more information. The implementation passes all 57 tests in lib/test_bpf.c on ARMv8 Foundation Model :) Also tested by Will on Juno platform. Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2014-09-08 14:39:21 +01:00

19 Commits