Commit Graph

19 Commits

Author SHA1 Message Date
Xu Kuohai 042152c27c bpf, arm64: Sign return address for JITed code
Sign return address for JITed code when the kernel is built with pointer
authentication enabled:

1. Sign LR with paciasp instruction before LR is pushed to stack. Since
   paciasp acts like landing pads for function entry, no need to insert
   bti instruction before paciasp.

2. Authenticate LR with autiasp instruction after LR is popped from stack.

For BPF tail call, the stack frame constructed by the caller is reused by
the callee. That is, the stack frame is constructed by the caller and
destructed by the callee. Thus LR is signed and pushed to the stack in the
caller's prologue, and poped from the stack and authenticated in the
callee's epilogue.

For BPF2BPF call, the caller and callee construct their own stack frames,
and sign and authenticate their own LRs.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://events.static.linuxfound.org/sites/events/files/slides/slides_23.pdf
Link: https://lore.kernel.org/bpf/20220402073942.3782529-1-xukuohai@huawei.com
2022-04-06 00:04:22 +02:00
Xu Kuohai 7db6c0f1d8 bpf, arm64: Optimize BPF store/load using arm64 str/ldr(immediate offset)
The current BPF store/load instruction is translated by the JIT into two
instructions. The first instruction moves the immediate offset into a
temporary register. The second instruction uses this temporary register
to do the real store/load.

In fact, arm64 supports addressing with immediate offsets. So This patch
introduces optimization that uses arm64 str/ldr instruction with immediate
offset when the offset fits.

Example of generated instuction for r2 = *(u64 *)(r1 + 0):

without optimization:
mov x10, 0
ldr x1, [x0, x10]

with optimization:
ldr x1, [x0, 0]

If the offset is negative, or is not aligned correctly, or exceeds max
value, rollback to the use of temporary register.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220321152852.2334294-3-xukuohai@huawei.com
2022-04-01 00:27:34 +02:00
Hou Tao 1902472b4f bpf, arm64: Support more atomic operations
Atomics for eBPF patch series adds support for atomic[64]_fetch_add,
atomic[64]_[fetch_]{and,or,xor} and atomic[64]_{xchg|cmpxchg}, but it
only adds support for x86-64, so support these atomic operations for
arm64 as well.

Basically the implementation procedure is almost mechanical translation
of code snippets in atomic_ll_sc.h & atomic_lse.h & cmpxchg.h located
under arch/arm64/include/asm.

When LSE atomic is unavailable, an extra temporary register is needed for
(BPF_ADD | BPF_FETCH) to save the value of src register, instead of adding
TMP_REG_4 just use BPF_REG_AX instead. Also make emit_lse_atomic() as an
empty inline function when CONFIG_ARM64_LSE_ATOMICS is disabled.

For cpus_have_cap(ARM64_HAS_LSE_ATOMICS) case and no-LSE-ATOMICS case, the
following three tests: "./test_verifier", "./test_progs -t atomic" and
"insmod ./test_bpf.ko" are exercised and passed.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220217072232.1186625-4-houtao1@huawei.com
2022-02-28 16:27:22 +01:00
Hou Tao fa1114d9eb arm64: insn: add encoders for atomic operations
It is a preparation patch for eBPF atomic supports under arm64. eBPF
needs support atomic[64]_fetch_add, atomic[64]_[fetch_]{and,or,xor} and
atomic[64]_{xchg|cmpxchg}. The ordering semantics of eBPF atomics are
the same with the implementations in linux kernel.

Add three helpers to support LDCLR/LDEOR/LDSET/SWP, CAS and DMB
instructions. STADD/STCLR/STEOR/STSET are simply encoded as aliases for
LDADD/LDCLR/LDEOR/LDSET with XZR as the destination register, so no extra
helper is added. atomic_fetch_add() and other atomic ops needs support for
STLXR instruction, so extend enum aarch64_insn_ldst_type to do that.

LDADD/LDEOR/LDSET/SWP and CAS instructions are only available when LSE
atomics is enabled, so just return AARCH64_BREAK_FAULT directly in
these newly-added helpers if CONFIG_ARM64_LSE_ATOMICS is disabled.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20220217072232.1186625-3-houtao1@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-02-22 21:25:48 +00:00
Will Deacon d27865279f Merge branch 'for-next/bti' into for-next/core
Support for Branch Target Identification (BTI) in user and kernel
(Mark Brown and others)
* for-next/bti: (39 commits)
  arm64: vdso: Fix CFI directives in sigreturn trampoline
  arm64: vdso: Don't prefix sigreturn trampoline with a BTI C instruction
  arm64: bti: Fix support for userspace only BTI
  arm64: kconfig: Update and comment GCC version check for kernel BTI
  arm64: vdso: Map the vDSO text with guarded pages when built for BTI
  arm64: vdso: Force the vDSO to be linked as BTI when built for BTI
  arm64: vdso: Annotate for BTI
  arm64: asm: Provide a mechanism for generating ELF note for BTI
  arm64: bti: Provide Kconfig for kernel mode BTI
  arm64: mm: Mark executable text as guarded pages
  arm64: bpf: Annotate JITed code for BTI
  arm64: Set GP bit in kernel page tables to enable BTI for the kernel
  arm64: asm: Override SYM_FUNC_START when building the kernel with BTI
  arm64: bti: Support building kernel C code using BTI
  arm64: Document why we enable PAC support for leaf functions
  arm64: insn: Report PAC and BTI instructions as skippable
  arm64: insn: Don't assume unrecognized HINTs are skippable
  arm64: insn: Provide a better name for aarch64_insn_is_nop()
  arm64: insn: Add constants for new HINT instruction decode
  arm64: Disable old style assembly annotations
  ...
2020-05-28 18:00:51 +01:00
Luke Nelson fd868f1481 bpf, arm64: Optimize ADD,SUB,JMP BPF_K using arm64 add/sub immediates
The current code for BPF_{ADD,SUB} BPF_K loads the BPF immediate to a
temporary register before performing the addition/subtraction. Similarly,
BPF_JMP BPF_K cases load the immediate to a temporary register before
comparison.

This patch introduces optimizations that use arm64 immediate add, sub,
cmn, or cmp instructions when the BPF immediate fits. If the immediate
does not fit, it falls back to using a temporary register.

Example of generated code for BPF_ALU64_IMM(BPF_ADD, R0, 2):

without optimization:

  24: mov x10, #0x2
  28: add x7, x7, x10

with optimization:

  24: add x7, x7, #0x2

The code could use A64_{ADD,SUB}_I directly and check if it returns
AARCH64_BREAK_FAULT, similar to how logical immediates are handled.
However, aarch64_insn_gen_add_sub_imm from insn.c prints error messages
when the immediate does not fit, and it's simpler to check if the
immediate fits ahead of time.

Co-developed-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20200508181547.24783-4-luke.r.nels@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-05-11 12:21:39 +01:00
Luke Nelson fd49591cb4 bpf, arm64: Optimize AND,OR,XOR,JSET BPF_K using arm64 logical immediates
The current code for BPF_{AND,OR,XOR,JSET} BPF_K loads the immediate to
a temporary register before use.

This patch changes the code to avoid using a temporary register
when the BPF immediate is encodable using an arm64 logical immediate
instruction. If the encoding fails (due to the immediate not being
encodable), it falls back to using a temporary register.

Example of generated code for BPF_ALU32_IMM(BPF_AND, R0, 0x80000001):

without optimization:

  24: mov  w10, #0x8000ffff
  28: movk w10, #0x1
  2c: and  w7, w7, w10

with optimization:

  24: and  w7, w7, #0x80000001

Since the encoding process is quite complex, the JIT reuses existing
functionality in arch/arm64/kernel/insn.c for encoding logical immediates
rather than duplicate it in the JIT.

Co-developed-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20200508181547.24783-3-luke.r.nels@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-05-11 12:21:39 +01:00
Mark Brown fa76cfe65c arm64: bpf: Annotate JITed code for BTI
In order to extend the protection offered by BTI to all code executing in
kernel mode we need to annotate JITed BPF code appropriately for BTI. To
do this we need to add a landing pad to the start of each BPF function and
also immediately after the function prologue if we are emitting a function
which can be tail called. Jumps within BPF functions are all to immediate
offsets and therefore do not require landing pads.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200506195138.22086-6-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2020-05-07 17:53:20 +01:00
Jerin Jacob 504792e07a arm64: bpf: optimize modulo operation
Optimize modulo operation instruction generation by
using single MSUB instruction vs MUL followed by SUB
instruction scheme.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-09-03 15:44:40 +02:00
Thomas Gleixner caab277b1d treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation this program is
  distributed in the hope that it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details you should have received a copy of the gnu general
  public license along with this program if not see http www gnu org
  licenses

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 503 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Enrico Weigelt <info@metux.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19 17:09:07 +02:00
Daniel Borkmann 34b8ab091f bpf, arm64: use more scalable stadd over ldxr / stxr loop in xadd
Since ARMv8.1 supplement introduced LSE atomic instructions back in 2016,
lets add support for STADD and use that in favor of LDXR / STXR loop for
the XADD mapping if available. STADD is encoded as an alias for LDADD with
XZR as the destination register, therefore add LDADD to the instruction
encoder along with STADD as special case and use it in the JIT for CPUs
that advertise LSE atomics in CPUID register. If immediate offset in the
BPF XADD insn is 0, then use dst register directly instead of temporary
one.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-26 18:53:40 -07:00
Daniel Borkmann 8968c67a82 bpf, arm64: remove prefetch insn in xadd mapping
Prefetch-with-intent-to-write is currently part of the XADD mapping in
the AArch64 JIT and follows the kernel's implementation of atomic_add.
This may interfere with other threads executing the LDXR/STXR loop,
leading to potential starvation and fairness issues. Drop the optional
prefetch instruction.

Fixes: 85f68fe898 ("bpf, arm64: implement jiting of BPF_XADD")
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-26 18:53:15 -07:00
Daniel Borkmann c362b2f34e bpf, arm64: implement jiting of BPF_J{LT, LE, SLT, SLE}
This work implements jiting of BPF_J{LT,LE,SLT,SLE} instructions
with BPF_X/BPF_K variants for the arm64 eBPF JIT.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09 16:53:56 -07:00
Daniel Borkmann 85f68fe898 bpf, arm64: implement jiting of BPF_XADD
This work adds BPF_XADD for BPF_W/BPF_DW to the arm64 JIT and therefore
completes JITing of all BPF instructions, meaning we can thus also remove
the 'notyet' label and do not need to fall back to the interpreter when
BPF_XADD is used in a program!

This now also brings arm64 JIT in line with x86_64, s390x, ppc64, sparc64,
where all current eBPF features are supported.

BPF_W example from test_bpf:

  .u.insns_int = {
    BPF_ALU32_IMM(BPF_MOV, R0, 0x12),
    BPF_ST_MEM(BPF_W, R10, -40, 0x10),
    BPF_STX_XADD(BPF_W, R10, R0, -40),
    BPF_LDX_MEM(BPF_W, R0, R10, -40),
    BPF_EXIT_INSN(),
  },

  [...]
  00000020:  52800247  mov w7, #0x12 // #18
  00000024:  928004eb  mov x11, #0xffffffffffffffd8 // #-40
  00000028:  d280020a  mov x10, #0x10 // #16
  0000002c:  b82b6b2a  str w10, [x25,x11]
  // start of xadd mapping:
  00000030:  928004ea  mov x10, #0xffffffffffffffd8 // #-40
  00000034:  8b19014a  add x10, x10, x25
  00000038:  f9800151  prfm pstl1strm, [x10]
  0000003c:  885f7d4b  ldxr w11, [x10]
  00000040:  0b07016b  add w11, w11, w7
  00000044:  880b7d4b  stxr w11, w11, [x10]
  00000048:  35ffffab  cbnz w11, 0x0000003c
  // end of xadd mapping:
  [...]

BPF_DW example from test_bpf:

  .u.insns_int = {
    BPF_ALU32_IMM(BPF_MOV, R0, 0x12),
    BPF_ST_MEM(BPF_DW, R10, -40, 0x10),
    BPF_STX_XADD(BPF_DW, R10, R0, -40),
    BPF_LDX_MEM(BPF_DW, R0, R10, -40),
    BPF_EXIT_INSN(),
  },

  [...]
  00000020:  52800247  mov w7,  #0x12 // #18
  00000024:  928004eb  mov x11, #0xffffffffffffffd8 // #-40
  00000028:  d280020a  mov x10, #0x10 // #16
  0000002c:  f82b6b2a  str x10, [x25,x11]
  // start of xadd mapping:
  00000030:  928004ea  mov x10, #0xffffffffffffffd8 // #-40
  00000034:  8b19014a  add x10, x10, x25
  00000038:  f9800151  prfm pstl1strm, [x10]
  0000003c:  c85f7d4b  ldxr x11, [x10]
  00000040:  8b07016b  add x11, x11, x7
  00000044:  c80b7d4b  stxr w11, x11, [x10]
  00000048:  35ffffab  cbnz w11, 0x0000003c
  // end of xadd mapping:
  [...]

Tested on Cavium ThunderX ARMv8, test suite results after the patch:

  No JIT:   [ 3751.855362] test_bpf: Summary: 311 PASSED, 0 FAILED, [0/303 JIT'ed]
  With JIT: [ 3573.759527] test_bpf: Summary: 311 PASSED, 0 FAILED, [303/303 JIT'ed]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-02 15:04:50 -04:00
Zi Shen Lim ddb55992b0 arm64: bpf: implement bpf_tail_call() helper
Add support for JMP_CALL_X (tail call) introduced by commit 04fd61ab36
("bpf: allow bpf programs to tail-call other bpf programs").

bpf_tail_call() arguments:
  ctx   - context pointer passed to next program
  array - pointer to map which type is BPF_MAP_TYPE_PROG_ARRAY
  index - index inside array that selects specific program to run

In this implementation arm64 JIT jumps into callee program after prologue,
so callee program reuses the same stack. For tail_call_cnt, we use the
callee-saved R26 (which was already saved/restored but previously unused
by JIT).

With this patch a tail call generates the following code on arm64:

  if (index >= array->map.max_entries)
      goto out;

  34:   mov     x10, #0x10                      // #16
  38:   ldr     w10, [x1,x10]
  3c:   cmp     w2, w10
  40:   b.ge    0x0000000000000074

  if (tail_call_cnt > MAX_TAIL_CALL_CNT)
      goto out;
  tail_call_cnt++;

  44:   mov     x10, #0x20                      // #32
  48:   cmp     x26, x10
  4c:   b.gt    0x0000000000000074
  50:   add     x26, x26, #0x1

  prog = array->ptrs[index];
  if (prog == NULL)
      goto out;

  54:   mov     x10, #0x68                      // #104
  58:   ldr     x10, [x1,x10]
  5c:   ldr     x11, [x10,x2]
  60:   cbz     x11, 0x0000000000000074

  goto *(prog->bpf_func + prologue_size);

  64:   mov     x10, #0x20                      // #32
  68:   ldr     x10, [x11,x10]
  6c:   add     x10, x10, #0x20
  70:   br      x10
  74:

Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10 23:11:49 -07:00
Zi Shen Lim 251599e1d6 arm64: bpf: fix div-by-zero case
In the case of division by zero in a BPF program:
	A = A / X;  (X == 0)
the expected behavior is to terminate with return value 0.

This is confirmed by the test case introduced in commit 86bf1721b2
("test_bpf: add tests checking that JIT/interpreter sets A and X to 0.").

Reported-by: Yang Shi <yang.shi@linaro.org>
Tested-by: Yang Shi <yang.shi@linaro.org>
CC: Xi Wang <xi.wang@gmail.com>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: linux-arm-kernel@lists.infradead.org
CC: linux-kernel@vger.kernel.org
Fixes: e54bcde3d6 ("arm64: eBPF JIT compiler")
Cc: <stable@vger.kernel.org> # 3.18+
Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-11-06 16:58:36 +00:00
Xi Wang d63903bbc3 arm64: bpf: fix endianness conversion bugs
Upper bits should be zeroed in endianness conversion:

- even when there's no need to change endianness (i.e., BPF_FROM_BE
  on big endian or BPF_FROM_LE on little endian);

- after rev16.

This patch fixes such bugs by emitting extra instructions to clear
upper bits.

Cc: Zi Shen Lim <zlim.lnx@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Fixes: e54bcde3d6 ("arm64: eBPF JIT compiler")
Cc: <stable@vger.kernel.org> # 3.18+
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-06-26 14:15:39 +01:00
Zi Shen Lim d65a634a0a arm64: bpf: add 'shift by register' instructions
Commit 72b603ee8c ("bpf: x86: add missing 'shift by register'
instructions to x64 eBPF JIT") noted support for 'shift by register'
in eBPF and added support for it for x64. Let's enable this for arm64
as well.

The arm64 eBPF JIT compiler now passes the new 'shift by register'
test case introduced in the same commit 72b603ee8c.

Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-10-20 17:47:03 +01:00
Zi Shen Lim e54bcde3d6 arm64: eBPF JIT compiler
The JIT compiler emits A64 instructions. It supports eBPF only.
Legacy BPF is supported thanks to conversion by BPF core.

JIT is enabled in the same way as for other architectures:

	echo 1 > /proc/sys/net/core/bpf_jit_enable

Or for additional compiler output:

	echo 2 > /proc/sys/net/core/bpf_jit_enable

See Documentation/networking/filter.txt for more information.

The implementation passes all 57 tests in lib/test_bpf.c
on ARMv8 Foundation Model :) Also tested by Will on Juno platform.

Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2014-09-08 14:39:21 +01:00