OpenCloudOS-Kernel/arch/riscv/net/bpf_jit_comp64.c

1155 lines
28 KiB
C
Raw Normal View History

bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
// SPDX-License-Identifier: GPL-2.0
/* BPF JIT compiler for RV64G
*
* Copyright(c) 2019 Björn Töpel <bjorn.topel@gmail.com>
*
*/
#include <linux/bpf.h>
#include <linux/filter.h>
#include "bpf_jit.h"
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
#define RV_REG_TCC RV_REG_A6
#define RV_REG_TCC_SAVED RV_REG_S6 /* Store A6 in S6 if program do calls */
static const int regmap[] = {
[BPF_REG_0] = RV_REG_A5,
[BPF_REG_1] = RV_REG_A0,
[BPF_REG_2] = RV_REG_A1,
[BPF_REG_3] = RV_REG_A2,
[BPF_REG_4] = RV_REG_A3,
[BPF_REG_5] = RV_REG_A4,
[BPF_REG_6] = RV_REG_S1,
[BPF_REG_7] = RV_REG_S2,
[BPF_REG_8] = RV_REG_S3,
[BPF_REG_9] = RV_REG_S4,
[BPF_REG_FP] = RV_REG_S5,
[BPF_REG_AX] = RV_REG_T0,
};
enum {
RV_CTX_F_SEEN_TAIL_CALL = 0,
RV_CTX_F_SEEN_CALL = RV_REG_RA,
RV_CTX_F_SEEN_S1 = RV_REG_S1,
RV_CTX_F_SEEN_S2 = RV_REG_S2,
RV_CTX_F_SEEN_S3 = RV_REG_S3,
RV_CTX_F_SEEN_S4 = RV_REG_S4,
RV_CTX_F_SEEN_S5 = RV_REG_S5,
RV_CTX_F_SEEN_S6 = RV_REG_S6,
};
static u8 bpf_to_rv_reg(int bpf_reg, struct rv_jit_context *ctx)
{
u8 reg = regmap[bpf_reg];
switch (reg) {
case RV_CTX_F_SEEN_S1:
case RV_CTX_F_SEEN_S2:
case RV_CTX_F_SEEN_S3:
case RV_CTX_F_SEEN_S4:
case RV_CTX_F_SEEN_S5:
case RV_CTX_F_SEEN_S6:
__set_bit(reg, &ctx->flags);
}
return reg;
};
static bool seen_reg(int reg, struct rv_jit_context *ctx)
{
switch (reg) {
case RV_CTX_F_SEEN_CALL:
case RV_CTX_F_SEEN_S1:
case RV_CTX_F_SEEN_S2:
case RV_CTX_F_SEEN_S3:
case RV_CTX_F_SEEN_S4:
case RV_CTX_F_SEEN_S5:
case RV_CTX_F_SEEN_S6:
return test_bit(reg, &ctx->flags);
}
return false;
}
static void mark_fp(struct rv_jit_context *ctx)
{
__set_bit(RV_CTX_F_SEEN_S5, &ctx->flags);
}
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
static void mark_call(struct rv_jit_context *ctx)
{
__set_bit(RV_CTX_F_SEEN_CALL, &ctx->flags);
}
static bool seen_call(struct rv_jit_context *ctx)
{
return test_bit(RV_CTX_F_SEEN_CALL, &ctx->flags);
}
static void mark_tail_call(struct rv_jit_context *ctx)
{
__set_bit(RV_CTX_F_SEEN_TAIL_CALL, &ctx->flags);
}
static bool seen_tail_call(struct rv_jit_context *ctx)
{
return test_bit(RV_CTX_F_SEEN_TAIL_CALL, &ctx->flags);
}
static u8 rv_tail_call_reg(struct rv_jit_context *ctx)
{
mark_tail_call(ctx);
if (seen_call(ctx)) {
__set_bit(RV_CTX_F_SEEN_S6, &ctx->flags);
return RV_REG_S6;
}
return RV_REG_A6;
}
static bool is_32b_int(s64 val)
{
return -(1L << 31) <= val && val < (1L << 31);
}
riscv, bpf: Fix offset range checking for auipc+jalr on RV64 The existing code in emit_call on RV64 checks that the PC-relative offset to the function fits in 32 bits before calling emit_jump_and_link to emit an auipc+jalr pair. However, this check is incorrect because offsets in the range [2^31 - 2^11, 2^31 - 1] cannot be encoded using auipc+jalr on RV64 (see discussion [1]). The RISC-V spec has recently been updated to reflect this fact [2, 3]. This patch fixes the problem by moving the check on the offset into emit_jump_and_link and modifying it to the correct range of encodable offsets, which is [-2^31 - 2^11, 2^31 - 2^11). This also enforces the check on the offset to other uses of emit_jump_and_link (e.g., BPF_JA) as well. Currently, this bug is unlikely to be triggered, because the memory region from which JITed images are allocated is close enough to kernel text for the offsets to not become too large; and because the bounds on BPF program size are small enough. This patch prevents this problem from becoming an issue if either of these change. [1]: https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/bwWFhBnnZFQ [2]: https://github.com/riscv/riscv-isa-manual/commit/b1e42e09ac55116dbf9de5e4fb326a5a90e4a993 [3]: https://github.com/riscv/riscv-isa-manual/commit/4c1b2066ebd2965a422e41eb262d0a208a7fea07 Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200406221604.18547-1-luke.r.nels@gmail.com
2020-04-07 06:16:04 +08:00
static bool in_auipc_jalr_range(s64 val)
{
/*
* auipc+jalr can reach any signed PC-relative offset in the range
* [-2^31 - 2^11, 2^31 - 2^11).
*/
return (-(1L << 31) - (1L << 11)) <= val &&
val < ((1L << 31) - (1L << 11));
}
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
static void emit_imm(u8 rd, s64 val, struct rv_jit_context *ctx)
{
/* Note that the immediate from the add is sign-extended,
* which means that we need to compensate this by adding 2^12,
* when the 12th bit is set. A simpler way of doing this, and
* getting rid of the check, is to just add 2**11 before the
* shift. The "Loading a 32-Bit constant" example from the
* "Computer Organization and Design, RISC-V edition" book by
* Patterson/Hennessy highlights this fact.
*
* This also means that we need to process LSB to MSB.
*/
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
s64 upper = (val + (1 << 11)) >> 12;
/* Sign-extend lower 12 bits to 64 bits since immediates for li, addiw,
* and addi are signed and RVC checks will perform signed comparisons.
*/
s64 lower = ((val & 0xfff) << 52) >> 52;
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
int shift;
if (is_32b_int(val)) {
if (upper)
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_lui(rd, upper, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
if (!upper) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_li(rd, lower, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
return;
}
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_addiw(rd, rd, lower, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
return;
}
shift = __ffs(upper);
upper >>= shift;
shift += 12;
emit_imm(rd, upper, ctx);
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_slli(rd, rd, shift, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
if (lower)
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_addi(rd, rd, lower, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
}
static void __build_epilogue(bool is_tail_call, struct rv_jit_context *ctx)
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
{
int stack_adjust = ctx->stack_size, store_offset = stack_adjust - 8;
if (seen_reg(RV_REG_RA, ctx)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_ld(RV_REG_RA, store_offset, RV_REG_SP, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
store_offset -= 8;
}
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_ld(RV_REG_FP, store_offset, RV_REG_SP, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
store_offset -= 8;
if (seen_reg(RV_REG_S1, ctx)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_ld(RV_REG_S1, store_offset, RV_REG_SP, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
store_offset -= 8;
}
if (seen_reg(RV_REG_S2, ctx)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_ld(RV_REG_S2, store_offset, RV_REG_SP, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
store_offset -= 8;
}
if (seen_reg(RV_REG_S3, ctx)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_ld(RV_REG_S3, store_offset, RV_REG_SP, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
store_offset -= 8;
}
if (seen_reg(RV_REG_S4, ctx)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_ld(RV_REG_S4, store_offset, RV_REG_SP, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
store_offset -= 8;
}
if (seen_reg(RV_REG_S5, ctx)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_ld(RV_REG_S5, store_offset, RV_REG_SP, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
store_offset -= 8;
}
if (seen_reg(RV_REG_S6, ctx)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_ld(RV_REG_S6, store_offset, RV_REG_SP, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
store_offset -= 8;
}
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_addi(RV_REG_SP, RV_REG_SP, stack_adjust, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
/* Set return value. */
if (!is_tail_call)
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_mv(RV_REG_A0, RV_REG_A5, ctx);
emit_jalr(RV_REG_ZERO, is_tail_call ? RV_REG_T3 : RV_REG_RA,
is_tail_call ? 4 : 0, /* skip TCC init */
ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
}
static void emit_bcc(u8 cond, u8 rd, u8 rs, int rvoff,
struct rv_jit_context *ctx)
{
switch (cond) {
case BPF_JEQ:
emit(rv_beq(rd, rs, rvoff >> 1), ctx);
return;
case BPF_JGT:
emit(rv_bltu(rs, rd, rvoff >> 1), ctx);
return;
case BPF_JLT:
emit(rv_bltu(rd, rs, rvoff >> 1), ctx);
return;
case BPF_JGE:
emit(rv_bgeu(rd, rs, rvoff >> 1), ctx);
return;
case BPF_JLE:
emit(rv_bgeu(rs, rd, rvoff >> 1), ctx);
return;
case BPF_JNE:
emit(rv_bne(rd, rs, rvoff >> 1), ctx);
return;
case BPF_JSGT:
emit(rv_blt(rs, rd, rvoff >> 1), ctx);
return;
case BPF_JSLT:
emit(rv_blt(rd, rs, rvoff >> 1), ctx);
return;
case BPF_JSGE:
emit(rv_bge(rd, rs, rvoff >> 1), ctx);
return;
case BPF_JSLE:
emit(rv_bge(rs, rd, rvoff >> 1), ctx);
}
}
static void emit_branch(u8 cond, u8 rd, u8 rs, int rvoff,
struct rv_jit_context *ctx)
{
s64 upper, lower;
if (is_13b_int(rvoff)) {
emit_bcc(cond, rd, rs, rvoff, ctx);
return;
}
/* Adjust for jal */
rvoff -= 4;
/* Transform, e.g.:
* bne rd,rs,foo
* to
* beq rd,rs,<.L1>
* (auipc foo)
* jal(r) foo
* .L1
*/
cond = invert_bpf_cond(cond);
if (is_21b_int(rvoff)) {
emit_bcc(cond, rd, rs, 8, ctx);
emit(rv_jal(RV_REG_ZERO, rvoff >> 1), ctx);
return;
}
/* 32b No need for an additional rvoff adjustment, since we
* get that from the auipc at PC', where PC = PC' + 4.
*/
upper = (rvoff + (1 << 11)) >> 12;
lower = rvoff & 0xfff;
emit_bcc(cond, rd, rs, 12, ctx);
emit(rv_auipc(RV_REG_T1, upper), ctx);
emit(rv_jalr(RV_REG_ZERO, RV_REG_T1, lower), ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
}
static void emit_zext_32(u8 reg, struct rv_jit_context *ctx)
{
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_slli(reg, reg, 32, ctx);
emit_srli(reg, reg, 32, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
}
static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
{
int tc_ninsn, off, start_insn = ctx->ninsns;
u8 tcc = rv_tail_call_reg(ctx);
/* a0: &ctx
* a1: &array
* a2: index
*
* if (index >= array->map.max_entries)
* goto out;
*/
tc_ninsn = insn ? ctx->offset[insn] - ctx->offset[insn - 1] :
ctx->offset[0];
emit_zext_32(RV_REG_A2, ctx);
off = offsetof(struct bpf_array, map.max_entries);
if (is_12b_check(off, insn))
return -1;
emit(rv_lwu(RV_REG_T1, off, RV_REG_A1), ctx);
off = ninsns_rvoff(tc_ninsn - (ctx->ninsns - start_insn));
emit_branch(BPF_JGE, RV_REG_A2, RV_REG_T1, off, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
/* if (TCC-- < 0)
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
* goto out;
*/
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_addi(RV_REG_T1, tcc, -1, ctx);
off = ninsns_rvoff(tc_ninsn - (ctx->ninsns - start_insn));
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2019-12-27 The following pull-request contains BPF updates for your *net-next* tree. We've added 127 non-merge commits during the last 17 day(s) which contain a total of 110 files changed, 6901 insertions(+), 2721 deletions(-). There are three merge conflicts. Conflicts and resolution looks as follows: 1) Merge conflict in net/bpf/test_run.c: There was a tree-wide cleanup c593642c8be0 ("treewide: Use sizeof_field() macro") which gets in the way with b590cb5f802d ("bpf: Switch to offsetofend in BPF_PROG_TEST_RUN"): <<<<<<< HEAD if (!range_is_zero(__skb, offsetof(struct __sk_buff, priority) + sizeof_field(struct __sk_buff, priority), ======= if (!range_is_zero(__skb, offsetofend(struct __sk_buff, priority), >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c There are a few occasions that look similar to this. Always take the chunk with offsetofend(). Note that there is one where the fields differ in here: <<<<<<< HEAD if (!range_is_zero(__skb, offsetof(struct __sk_buff, tstamp) + sizeof_field(struct __sk_buff, tstamp), ======= if (!range_is_zero(__skb, offsetofend(struct __sk_buff, gso_segs), >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c Just take the one with offsetofend() /and/ gso_segs. Latter is correct due to 850a88cc4096 ("bpf: Expose __sk_buff wire_len/gso_segs to BPF_PROG_TEST_RUN"). 2) Merge conflict in arch/riscv/net/bpf_jit_comp.c: (I'm keeping Bjorn in Cc here for a double-check in case I got it wrong.) <<<<<<< HEAD if (is_13b_check(off, insn)) return -1; emit(rv_blt(tcc, RV_REG_ZERO, off >> 1), ctx); ======= emit_branch(BPF_JSLT, RV_REG_T1, RV_REG_ZERO, off, ctx); >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c Result should look like: emit_branch(BPF_JSLT, tcc, RV_REG_ZERO, off, ctx); 3) Merge conflict in arch/riscv/include/asm/pgtable.h: <<<<<<< HEAD ======= #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1) #define VMALLOC_END (PAGE_OFFSET - 1) #define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE) #define BPF_JIT_REGION_SIZE (SZ_128M) #define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE) #define BPF_JIT_REGION_END (VMALLOC_END) /* * Roughly size the vmemmap space to be large enough to fit enough * struct pages to map half the virtual address space. Then * position vmemmap directly below the VMALLOC region. */ #define VMEMMAP_SHIFT \ (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT) #define VMEMMAP_SIZE BIT(VMEMMAP_SHIFT) #define VMEMMAP_END (VMALLOC_START - 1) #define VMEMMAP_START (VMALLOC_START - VMEMMAP_SIZE) #define vmemmap ((struct page *)VMEMMAP_START) >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c Only take the BPF_* defines from there and move them higher up in the same file. Remove the rest from the chunk. The VMALLOC_* etc defines got moved via 01f52e16b868 ("riscv: define vmemmap before pfn_to_page calls"). Result: [...] #define __S101 PAGE_READ_EXEC #define __S110 PAGE_SHARED_EXEC #define __S111 PAGE_SHARED_EXEC #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1) #define VMALLOC_END (PAGE_OFFSET - 1) #define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE) #define BPF_JIT_REGION_SIZE (SZ_128M) #define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE) #define BPF_JIT_REGION_END (VMALLOC_END) /* * Roughly size the vmemmap space to be large enough to fit enough * struct pages to map half the virtual address space. Then * position vmemmap directly below the VMALLOC region. */ #define VMEMMAP_SHIFT \ (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT) #define VMEMMAP_SIZE BIT(VMEMMAP_SHIFT) #define VMEMMAP_END (VMALLOC_START - 1) #define VMEMMAP_START (VMALLOC_START - VMEMMAP_SIZE) [...] Let me know if there are any other issues. Anyway, the main changes are: 1) Extend bpftool to produce a struct (aka "skeleton") tailored and specific to a provided BPF object file. This provides an alternative, simplified API compared to standard libbpf interaction. Also, add libbpf extern variable resolution for .kconfig section to import Kconfig data, from Andrii Nakryiko. 2) Add BPF dispatcher for XDP which is a mechanism to avoid indirect calls by generating a branch funnel as discussed back in bpfconf'19 at LSF/MM. Also, add various BPF riscv JIT improvements, from Björn Töpel. 3) Extend bpftool to allow matching BPF programs and maps by name, from Paul Chaignon. 4) Support for replacing cgroup BPF programs attached with BPF_F_ALLOW_MULTI flag for allowing updates without service interruption, from Andrey Ignatov. 5) Cleanup and simplification of ring access functions for AF_XDP with a bonus of 0-5% performance improvement, from Magnus Karlsson. 6) Enable BPF JITs for x86-64 and arm64 by default. Also, final version of audit support for BPF, from Daniel Borkmann and latter with Jiri Olsa. 7) Move and extend test_select_reuseport into BPF program tests under BPF selftests, from Jakub Sitnicki. 8) Various BPF sample improvements for xdpsock for customizing parameters to set up and benchmark AF_XDP, from Jay Jayatheerthan. 9) Improve libbpf to provide a ulimit hint on permission denied errors. Also change XDP sample programs to attach in driver mode by default, from Toke Høiland-Jørgensen. 10) Extend BPF test infrastructure to allow changing skb mark from tc BPF programs, from Nikita V. Shirokov. 11) Optimize prologue code sequence in BPF arm32 JIT, from Russell King. 12) Fix xdp_redirect_cpu BPF sample to manually attach to tracepoints after libbpf conversion, from Jesper Dangaard Brouer. 13) Minor misc improvements from various others. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-28 06:20:10 +08:00
emit_branch(BPF_JSLT, tcc, RV_REG_ZERO, off, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
/* prog = array->ptrs[index];
* if (!prog)
* goto out;
*/
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_slli(RV_REG_T2, RV_REG_A2, 3, ctx);
emit_add(RV_REG_T2, RV_REG_T2, RV_REG_A1, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
off = offsetof(struct bpf_array, ptrs);
if (is_12b_check(off, insn))
return -1;
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_ld(RV_REG_T2, off, RV_REG_T2, ctx);
off = ninsns_rvoff(tc_ninsn - (ctx->ninsns - start_insn));
emit_branch(BPF_JEQ, RV_REG_T2, RV_REG_ZERO, off, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
/* goto *(prog->bpf_func + 4); */
off = offsetof(struct bpf_prog, bpf_func);
if (is_12b_check(off, insn))
return -1;
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_ld(RV_REG_T3, off, RV_REG_T2, ctx);
emit_mv(RV_REG_TCC, RV_REG_T1, ctx);
__build_epilogue(true, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
return 0;
}
static void init_regs(u8 *rd, u8 *rs, const struct bpf_insn *insn,
struct rv_jit_context *ctx)
{
u8 code = insn->code;
switch (code) {
case BPF_JMP | BPF_JA:
case BPF_JMP | BPF_CALL:
case BPF_JMP | BPF_EXIT:
case BPF_JMP | BPF_TAIL_CALL:
break;
default:
*rd = bpf_to_rv_reg(insn->dst_reg, ctx);
}
if (code & (BPF_ALU | BPF_X) || code & (BPF_ALU64 | BPF_X) ||
code & (BPF_JMP | BPF_X) || code & (BPF_JMP32 | BPF_X) ||
code & BPF_LDX || code & BPF_STX)
*rs = bpf_to_rv_reg(insn->src_reg, ctx);
}
static void emit_zext_32_rd_rs(u8 *rd, u8 *rs, struct rv_jit_context *ctx)
{
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_mv(RV_REG_T2, *rd, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
emit_zext_32(RV_REG_T2, ctx);
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_mv(RV_REG_T1, *rs, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
emit_zext_32(RV_REG_T1, ctx);
*rd = RV_REG_T2;
*rs = RV_REG_T1;
}
static void emit_sext_32_rd_rs(u8 *rd, u8 *rs, struct rv_jit_context *ctx)
{
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_addiw(RV_REG_T2, *rd, 0, ctx);
emit_addiw(RV_REG_T1, *rs, 0, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
*rd = RV_REG_T2;
*rs = RV_REG_T1;
}
static void emit_zext_32_rd_t1(u8 *rd, struct rv_jit_context *ctx)
{
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_mv(RV_REG_T2, *rd, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
emit_zext_32(RV_REG_T2, ctx);
emit_zext_32(RV_REG_T1, ctx);
*rd = RV_REG_T2;
}
static void emit_sext_32_rd(u8 *rd, struct rv_jit_context *ctx)
{
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_addiw(RV_REG_T2, *rd, 0, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
*rd = RV_REG_T2;
}
riscv, bpf: Fix offset range checking for auipc+jalr on RV64 The existing code in emit_call on RV64 checks that the PC-relative offset to the function fits in 32 bits before calling emit_jump_and_link to emit an auipc+jalr pair. However, this check is incorrect because offsets in the range [2^31 - 2^11, 2^31 - 1] cannot be encoded using auipc+jalr on RV64 (see discussion [1]). The RISC-V spec has recently been updated to reflect this fact [2, 3]. This patch fixes the problem by moving the check on the offset into emit_jump_and_link and modifying it to the correct range of encodable offsets, which is [-2^31 - 2^11, 2^31 - 2^11). This also enforces the check on the offset to other uses of emit_jump_and_link (e.g., BPF_JA) as well. Currently, this bug is unlikely to be triggered, because the memory region from which JITed images are allocated is close enough to kernel text for the offsets to not become too large; and because the bounds on BPF program size are small enough. This patch prevents this problem from becoming an issue if either of these change. [1]: https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/bwWFhBnnZFQ [2]: https://github.com/riscv/riscv-isa-manual/commit/b1e42e09ac55116dbf9de5e4fb326a5a90e4a993 [3]: https://github.com/riscv/riscv-isa-manual/commit/4c1b2066ebd2965a422e41eb262d0a208a7fea07 Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200406221604.18547-1-luke.r.nels@gmail.com
2020-04-07 06:16:04 +08:00
static int emit_jump_and_link(u8 rd, s64 rvoff, bool force_jalr,
struct rv_jit_context *ctx)
{
s64 upper, lower;
if (rvoff && is_21b_int(rvoff) && !force_jalr) {
emit(rv_jal(rd, rvoff >> 1), ctx);
riscv, bpf: Fix offset range checking for auipc+jalr on RV64 The existing code in emit_call on RV64 checks that the PC-relative offset to the function fits in 32 bits before calling emit_jump_and_link to emit an auipc+jalr pair. However, this check is incorrect because offsets in the range [2^31 - 2^11, 2^31 - 1] cannot be encoded using auipc+jalr on RV64 (see discussion [1]). The RISC-V spec has recently been updated to reflect this fact [2, 3]. This patch fixes the problem by moving the check on the offset into emit_jump_and_link and modifying it to the correct range of encodable offsets, which is [-2^31 - 2^11, 2^31 - 2^11). This also enforces the check on the offset to other uses of emit_jump_and_link (e.g., BPF_JA) as well. Currently, this bug is unlikely to be triggered, because the memory region from which JITed images are allocated is close enough to kernel text for the offsets to not become too large; and because the bounds on BPF program size are small enough. This patch prevents this problem from becoming an issue if either of these change. [1]: https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/bwWFhBnnZFQ [2]: https://github.com/riscv/riscv-isa-manual/commit/b1e42e09ac55116dbf9de5e4fb326a5a90e4a993 [3]: https://github.com/riscv/riscv-isa-manual/commit/4c1b2066ebd2965a422e41eb262d0a208a7fea07 Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200406221604.18547-1-luke.r.nels@gmail.com
2020-04-07 06:16:04 +08:00
return 0;
} else if (in_auipc_jalr_range(rvoff)) {
upper = (rvoff + (1 << 11)) >> 12;
lower = rvoff & 0xfff;
emit(rv_auipc(RV_REG_T1, upper), ctx);
emit(rv_jalr(rd, RV_REG_T1, lower), ctx);
return 0;
}
riscv, bpf: Fix offset range checking for auipc+jalr on RV64 The existing code in emit_call on RV64 checks that the PC-relative offset to the function fits in 32 bits before calling emit_jump_and_link to emit an auipc+jalr pair. However, this check is incorrect because offsets in the range [2^31 - 2^11, 2^31 - 1] cannot be encoded using auipc+jalr on RV64 (see discussion [1]). The RISC-V spec has recently been updated to reflect this fact [2, 3]. This patch fixes the problem by moving the check on the offset into emit_jump_and_link and modifying it to the correct range of encodable offsets, which is [-2^31 - 2^11, 2^31 - 2^11). This also enforces the check on the offset to other uses of emit_jump_and_link (e.g., BPF_JA) as well. Currently, this bug is unlikely to be triggered, because the memory region from which JITed images are allocated is close enough to kernel text for the offsets to not become too large; and because the bounds on BPF program size are small enough. This patch prevents this problem from becoming an issue if either of these change. [1]: https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/bwWFhBnnZFQ [2]: https://github.com/riscv/riscv-isa-manual/commit/b1e42e09ac55116dbf9de5e4fb326a5a90e4a993 [3]: https://github.com/riscv/riscv-isa-manual/commit/4c1b2066ebd2965a422e41eb262d0a208a7fea07 Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200406221604.18547-1-luke.r.nels@gmail.com
2020-04-07 06:16:04 +08:00
pr_err("bpf-jit: target offset 0x%llx is out of range\n", rvoff);
return -ERANGE;
}
static bool is_signed_bpf_cond(u8 cond)
{
return cond == BPF_JSGT || cond == BPF_JSLT ||
cond == BPF_JSGE || cond == BPF_JSLE;
}
static int emit_call(bool fixed, u64 addr, struct rv_jit_context *ctx)
{
s64 off = 0;
u64 ip;
u8 rd;
riscv, bpf: Fix offset range checking for auipc+jalr on RV64 The existing code in emit_call on RV64 checks that the PC-relative offset to the function fits in 32 bits before calling emit_jump_and_link to emit an auipc+jalr pair. However, this check is incorrect because offsets in the range [2^31 - 2^11, 2^31 - 1] cannot be encoded using auipc+jalr on RV64 (see discussion [1]). The RISC-V spec has recently been updated to reflect this fact [2, 3]. This patch fixes the problem by moving the check on the offset into emit_jump_and_link and modifying it to the correct range of encodable offsets, which is [-2^31 - 2^11, 2^31 - 2^11). This also enforces the check on the offset to other uses of emit_jump_and_link (e.g., BPF_JA) as well. Currently, this bug is unlikely to be triggered, because the memory region from which JITed images are allocated is close enough to kernel text for the offsets to not become too large; and because the bounds on BPF program size are small enough. This patch prevents this problem from becoming an issue if either of these change. [1]: https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/bwWFhBnnZFQ [2]: https://github.com/riscv/riscv-isa-manual/commit/b1e42e09ac55116dbf9de5e4fb326a5a90e4a993 [3]: https://github.com/riscv/riscv-isa-manual/commit/4c1b2066ebd2965a422e41eb262d0a208a7fea07 Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200406221604.18547-1-luke.r.nels@gmail.com
2020-04-07 06:16:04 +08:00
int ret;
if (addr && ctx->insns) {
ip = (u64)(long)(ctx->insns + ctx->ninsns);
off = addr - ip;
}
riscv, bpf: Fix offset range checking for auipc+jalr on RV64 The existing code in emit_call on RV64 checks that the PC-relative offset to the function fits in 32 bits before calling emit_jump_and_link to emit an auipc+jalr pair. However, this check is incorrect because offsets in the range [2^31 - 2^11, 2^31 - 1] cannot be encoded using auipc+jalr on RV64 (see discussion [1]). The RISC-V spec has recently been updated to reflect this fact [2, 3]. This patch fixes the problem by moving the check on the offset into emit_jump_and_link and modifying it to the correct range of encodable offsets, which is [-2^31 - 2^11, 2^31 - 2^11). This also enforces the check on the offset to other uses of emit_jump_and_link (e.g., BPF_JA) as well. Currently, this bug is unlikely to be triggered, because the memory region from which JITed images are allocated is close enough to kernel text for the offsets to not become too large; and because the bounds on BPF program size are small enough. This patch prevents this problem from becoming an issue if either of these change. [1]: https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/bwWFhBnnZFQ [2]: https://github.com/riscv/riscv-isa-manual/commit/b1e42e09ac55116dbf9de5e4fb326a5a90e4a993 [3]: https://github.com/riscv/riscv-isa-manual/commit/4c1b2066ebd2965a422e41eb262d0a208a7fea07 Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200406221604.18547-1-luke.r.nels@gmail.com
2020-04-07 06:16:04 +08:00
ret = emit_jump_and_link(RV_REG_RA, off, !fixed, ctx);
if (ret)
return ret;
rd = bpf_to_rv_reg(BPF_REG_0, ctx);
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_mv(rd, RV_REG_A0, ctx);
return 0;
}
int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
bool extra_pass)
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
{
bool is64 = BPF_CLASS(insn->code) == BPF_ALU64 ||
BPF_CLASS(insn->code) == BPF_JMP;
riscv, bpf: Fix offset range checking for auipc+jalr on RV64 The existing code in emit_call on RV64 checks that the PC-relative offset to the function fits in 32 bits before calling emit_jump_and_link to emit an auipc+jalr pair. However, this check is incorrect because offsets in the range [2^31 - 2^11, 2^31 - 1] cannot be encoded using auipc+jalr on RV64 (see discussion [1]). The RISC-V spec has recently been updated to reflect this fact [2, 3]. This patch fixes the problem by moving the check on the offset into emit_jump_and_link and modifying it to the correct range of encodable offsets, which is [-2^31 - 2^11, 2^31 - 2^11). This also enforces the check on the offset to other uses of emit_jump_and_link (e.g., BPF_JA) as well. Currently, this bug is unlikely to be triggered, because the memory region from which JITed images are allocated is close enough to kernel text for the offsets to not become too large; and because the bounds on BPF program size are small enough. This patch prevents this problem from becoming an issue if either of these change. [1]: https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/bwWFhBnnZFQ [2]: https://github.com/riscv/riscv-isa-manual/commit/b1e42e09ac55116dbf9de5e4fb326a5a90e4a993 [3]: https://github.com/riscv/riscv-isa-manual/commit/4c1b2066ebd2965a422e41eb262d0a208a7fea07 Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200406221604.18547-1-luke.r.nels@gmail.com
2020-04-07 06:16:04 +08:00
int s, e, rvoff, ret, i = insn - ctx->prog->insnsi;
struct bpf_prog_aux *aux = ctx->prog->aux;
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
u8 rd = -1, rs = -1, code = insn->code;
s16 off = insn->off;
s32 imm = insn->imm;
init_regs(&rd, &rs, insn, ctx);
switch (code) {
/* dst = src */
case BPF_ALU | BPF_MOV | BPF_X:
case BPF_ALU64 | BPF_MOV | BPF_X:
if (imm == 1) {
/* Special mov32 for zext */
emit_zext_32(rd, ctx);
break;
}
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_mv(rd, rs, ctx);
if (!is64 && !aux->verifier_zext)
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
emit_zext_32(rd, ctx);
break;
/* dst = dst OP src */
case BPF_ALU | BPF_ADD | BPF_X:
case BPF_ALU64 | BPF_ADD | BPF_X:
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_add(rd, rd, rs, ctx);
if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
case BPF_ALU | BPF_SUB | BPF_X:
case BPF_ALU64 | BPF_SUB | BPF_X:
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
if (is64)
emit_sub(rd, rd, rs, ctx);
else
emit_subw(rd, rd, rs, ctx);
if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
case BPF_ALU | BPF_AND | BPF_X:
case BPF_ALU64 | BPF_AND | BPF_X:
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_and(rd, rd, rs, ctx);
if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
case BPF_ALU | BPF_OR | BPF_X:
case BPF_ALU64 | BPF_OR | BPF_X:
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_or(rd, rd, rs, ctx);
if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
case BPF_ALU | BPF_XOR | BPF_X:
case BPF_ALU64 | BPF_XOR | BPF_X:
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_xor(rd, rd, rs, ctx);
if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
case BPF_ALU | BPF_MUL | BPF_X:
case BPF_ALU64 | BPF_MUL | BPF_X:
emit(is64 ? rv_mul(rd, rd, rs) : rv_mulw(rd, rd, rs), ctx);
if (!is64 && !aux->verifier_zext)
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
emit_zext_32(rd, ctx);
break;
case BPF_ALU | BPF_DIV | BPF_X:
case BPF_ALU64 | BPF_DIV | BPF_X:
emit(is64 ? rv_divu(rd, rd, rs) : rv_divuw(rd, rd, rs), ctx);
if (!is64 && !aux->verifier_zext)
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
emit_zext_32(rd, ctx);
break;
case BPF_ALU | BPF_MOD | BPF_X:
case BPF_ALU64 | BPF_MOD | BPF_X:
emit(is64 ? rv_remu(rd, rd, rs) : rv_remuw(rd, rd, rs), ctx);
if (!is64 && !aux->verifier_zext)
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
emit_zext_32(rd, ctx);
break;
case BPF_ALU | BPF_LSH | BPF_X:
case BPF_ALU64 | BPF_LSH | BPF_X:
emit(is64 ? rv_sll(rd, rd, rs) : rv_sllw(rd, rd, rs), ctx);
if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
case BPF_ALU | BPF_RSH | BPF_X:
case BPF_ALU64 | BPF_RSH | BPF_X:
emit(is64 ? rv_srl(rd, rd, rs) : rv_srlw(rd, rd, rs), ctx);
if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
case BPF_ALU | BPF_ARSH | BPF_X:
case BPF_ALU64 | BPF_ARSH | BPF_X:
emit(is64 ? rv_sra(rd, rd, rs) : rv_sraw(rd, rd, rs), ctx);
if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
/* dst = -dst */
case BPF_ALU | BPF_NEG:
case BPF_ALU64 | BPF_NEG:
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_sub(rd, RV_REG_ZERO, rd, ctx);
if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
/* dst = BSWAP##imm(dst) */
case BPF_ALU | BPF_END | BPF_FROM_LE:
switch (imm) {
case 16:
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_slli(rd, rd, 48, ctx);
emit_srli(rd, rd, 48, ctx);
break;
case 32:
if (!aux->verifier_zext)
emit_zext_32(rd, ctx);
break;
case 64:
/* Do nothing */
break;
}
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
case BPF_ALU | BPF_END | BPF_FROM_BE:
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_li(RV_REG_T2, 0, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_andi(RV_REG_T1, rd, 0xff, ctx);
emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
emit_srli(rd, rd, 8, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
if (imm == 16)
goto out_be;
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_andi(RV_REG_T1, rd, 0xff, ctx);
emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
emit_srli(rd, rd, 8, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_andi(RV_REG_T1, rd, 0xff, ctx);
emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
emit_srli(rd, rd, 8, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
if (imm == 32)
goto out_be;
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_andi(RV_REG_T1, rd, 0xff, ctx);
emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
emit_srli(rd, rd, 8, ctx);
emit_andi(RV_REG_T1, rd, 0xff, ctx);
emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
emit_srli(rd, rd, 8, ctx);
emit_andi(RV_REG_T1, rd, 0xff, ctx);
emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
emit_srli(rd, rd, 8, ctx);
emit_andi(RV_REG_T1, rd, 0xff, ctx);
emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
emit_srli(rd, rd, 8, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
out_be:
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_andi(RV_REG_T1, rd, 0xff, ctx);
emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_mv(rd, RV_REG_T2, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
/* dst = imm */
case BPF_ALU | BPF_MOV | BPF_K:
case BPF_ALU64 | BPF_MOV | BPF_K:
emit_imm(rd, imm, ctx);
if (!is64 && !aux->verifier_zext)
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
emit_zext_32(rd, ctx);
break;
/* dst = dst OP imm */
case BPF_ALU | BPF_ADD | BPF_K:
case BPF_ALU64 | BPF_ADD | BPF_K:
if (is_12b_int(imm)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_addi(rd, rd, imm, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
} else {
emit_imm(RV_REG_T1, imm, ctx);
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_add(rd, rd, RV_REG_T1, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
}
if (!is64 && !aux->verifier_zext)
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
emit_zext_32(rd, ctx);
break;
case BPF_ALU | BPF_SUB | BPF_K:
case BPF_ALU64 | BPF_SUB | BPF_K:
if (is_12b_int(-imm)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_addi(rd, rd, -imm, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
} else {
emit_imm(RV_REG_T1, imm, ctx);
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_sub(rd, rd, RV_REG_T1, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
}
if (!is64 && !aux->verifier_zext)
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
emit_zext_32(rd, ctx);
break;
case BPF_ALU | BPF_AND | BPF_K:
case BPF_ALU64 | BPF_AND | BPF_K:
if (is_12b_int(imm)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_andi(rd, rd, imm, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
} else {
emit_imm(RV_REG_T1, imm, ctx);
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_and(rd, rd, RV_REG_T1, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
}
if (!is64 && !aux->verifier_zext)
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
emit_zext_32(rd, ctx);
break;
case BPF_ALU | BPF_OR | BPF_K:
case BPF_ALU64 | BPF_OR | BPF_K:
if (is_12b_int(imm)) {
emit(rv_ori(rd, rd, imm), ctx);
} else {
emit_imm(RV_REG_T1, imm, ctx);
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_or(rd, rd, RV_REG_T1, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
}
if (!is64 && !aux->verifier_zext)
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
emit_zext_32(rd, ctx);
break;
case BPF_ALU | BPF_XOR | BPF_K:
case BPF_ALU64 | BPF_XOR | BPF_K:
if (is_12b_int(imm)) {
emit(rv_xori(rd, rd, imm), ctx);
} else {
emit_imm(RV_REG_T1, imm, ctx);
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_xor(rd, rd, RV_REG_T1, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
}
if (!is64 && !aux->verifier_zext)
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
emit_zext_32(rd, ctx);
break;
case BPF_ALU | BPF_MUL | BPF_K:
case BPF_ALU64 | BPF_MUL | BPF_K:
emit_imm(RV_REG_T1, imm, ctx);
emit(is64 ? rv_mul(rd, rd, RV_REG_T1) :
rv_mulw(rd, rd, RV_REG_T1), ctx);
if (!is64 && !aux->verifier_zext)
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
emit_zext_32(rd, ctx);
break;
case BPF_ALU | BPF_DIV | BPF_K:
case BPF_ALU64 | BPF_DIV | BPF_K:
emit_imm(RV_REG_T1, imm, ctx);
emit(is64 ? rv_divu(rd, rd, RV_REG_T1) :
rv_divuw(rd, rd, RV_REG_T1), ctx);
if (!is64 && !aux->verifier_zext)
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
emit_zext_32(rd, ctx);
break;
case BPF_ALU | BPF_MOD | BPF_K:
case BPF_ALU64 | BPF_MOD | BPF_K:
emit_imm(RV_REG_T1, imm, ctx);
emit(is64 ? rv_remu(rd, rd, RV_REG_T1) :
rv_remuw(rd, rd, RV_REG_T1), ctx);
if (!is64 && !aux->verifier_zext)
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
emit_zext_32(rd, ctx);
break;
case BPF_ALU | BPF_LSH | BPF_K:
case BPF_ALU64 | BPF_LSH | BPF_K:
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_slli(rd, rd, imm, ctx);
if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
case BPF_ALU | BPF_RSH | BPF_K:
case BPF_ALU64 | BPF_RSH | BPF_K:
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
if (is64)
emit_srli(rd, rd, imm, ctx);
else
emit(rv_srliw(rd, rd, imm), ctx);
if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
case BPF_ALU | BPF_ARSH | BPF_K:
case BPF_ALU64 | BPF_ARSH | BPF_K:
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
if (is64)
emit_srai(rd, rd, imm, ctx);
else
emit(rv_sraiw(rd, rd, imm), ctx);
if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
/* JUMP off */
case BPF_JMP | BPF_JA:
rvoff = rv_offset(i, off, ctx);
riscv, bpf: Fix offset range checking for auipc+jalr on RV64 The existing code in emit_call on RV64 checks that the PC-relative offset to the function fits in 32 bits before calling emit_jump_and_link to emit an auipc+jalr pair. However, this check is incorrect because offsets in the range [2^31 - 2^11, 2^31 - 1] cannot be encoded using auipc+jalr on RV64 (see discussion [1]). The RISC-V spec has recently been updated to reflect this fact [2, 3]. This patch fixes the problem by moving the check on the offset into emit_jump_and_link and modifying it to the correct range of encodable offsets, which is [-2^31 - 2^11, 2^31 - 2^11). This also enforces the check on the offset to other uses of emit_jump_and_link (e.g., BPF_JA) as well. Currently, this bug is unlikely to be triggered, because the memory region from which JITed images are allocated is close enough to kernel text for the offsets to not become too large; and because the bounds on BPF program size are small enough. This patch prevents this problem from becoming an issue if either of these change. [1]: https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/bwWFhBnnZFQ [2]: https://github.com/riscv/riscv-isa-manual/commit/b1e42e09ac55116dbf9de5e4fb326a5a90e4a993 [3]: https://github.com/riscv/riscv-isa-manual/commit/4c1b2066ebd2965a422e41eb262d0a208a7fea07 Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200406221604.18547-1-luke.r.nels@gmail.com
2020-04-07 06:16:04 +08:00
ret = emit_jump_and_link(RV_REG_ZERO, rvoff, false, ctx);
if (ret)
return ret;
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
/* IF (dst COND src) JUMP off */
case BPF_JMP | BPF_JEQ | BPF_X:
case BPF_JMP32 | BPF_JEQ | BPF_X:
case BPF_JMP | BPF_JGT | BPF_X:
case BPF_JMP32 | BPF_JGT | BPF_X:
case BPF_JMP | BPF_JLT | BPF_X:
case BPF_JMP32 | BPF_JLT | BPF_X:
case BPF_JMP | BPF_JGE | BPF_X:
case BPF_JMP32 | BPF_JGE | BPF_X:
case BPF_JMP | BPF_JLE | BPF_X:
case BPF_JMP32 | BPF_JLE | BPF_X:
case BPF_JMP | BPF_JNE | BPF_X:
case BPF_JMP32 | BPF_JNE | BPF_X:
case BPF_JMP | BPF_JSGT | BPF_X:
case BPF_JMP32 | BPF_JSGT | BPF_X:
case BPF_JMP | BPF_JSLT | BPF_X:
case BPF_JMP32 | BPF_JSLT | BPF_X:
case BPF_JMP | BPF_JSGE | BPF_X:
case BPF_JMP32 | BPF_JSGE | BPF_X:
case BPF_JMP | BPF_JSLE | BPF_X:
case BPF_JMP32 | BPF_JSLE | BPF_X:
case BPF_JMP | BPF_JSET | BPF_X:
case BPF_JMP32 | BPF_JSET | BPF_X:
rvoff = rv_offset(i, off, ctx);
if (!is64) {
s = ctx->ninsns;
if (is_signed_bpf_cond(BPF_OP(code)))
emit_sext_32_rd_rs(&rd, &rs, ctx);
else
emit_zext_32_rd_rs(&rd, &rs, ctx);
e = ctx->ninsns;
/* Adjust for extra insns */
rvoff -= ninsns_rvoff(e - s);
}
if (BPF_OP(code) == BPF_JSET) {
/* Adjust for and */
rvoff -= 4;
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_and(RV_REG_T1, rd, rs, ctx);
emit_branch(BPF_JNE, RV_REG_T1, RV_REG_ZERO, rvoff,
ctx);
} else {
emit_branch(BPF_OP(code), rd, rs, rvoff, ctx);
}
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
/* IF (dst COND imm) JUMP off */
case BPF_JMP | BPF_JEQ | BPF_K:
case BPF_JMP32 | BPF_JEQ | BPF_K:
case BPF_JMP | BPF_JGT | BPF_K:
case BPF_JMP32 | BPF_JGT | BPF_K:
case BPF_JMP | BPF_JLT | BPF_K:
case BPF_JMP32 | BPF_JLT | BPF_K:
case BPF_JMP | BPF_JGE | BPF_K:
case BPF_JMP32 | BPF_JGE | BPF_K:
case BPF_JMP | BPF_JLE | BPF_K:
case BPF_JMP32 | BPF_JLE | BPF_K:
case BPF_JMP | BPF_JNE | BPF_K:
case BPF_JMP32 | BPF_JNE | BPF_K:
case BPF_JMP | BPF_JSGT | BPF_K:
case BPF_JMP32 | BPF_JSGT | BPF_K:
case BPF_JMP | BPF_JSLT | BPF_K:
case BPF_JMP32 | BPF_JSLT | BPF_K:
case BPF_JMP | BPF_JSGE | BPF_K:
case BPF_JMP32 | BPF_JSGE | BPF_K:
case BPF_JMP | BPF_JSLE | BPF_K:
case BPF_JMP32 | BPF_JSLE | BPF_K:
rvoff = rv_offset(i, off, ctx);
s = ctx->ninsns;
if (imm) {
emit_imm(RV_REG_T1, imm, ctx);
rs = RV_REG_T1;
} else {
/* If imm is 0, simply use zero register. */
rs = RV_REG_ZERO;
}
if (!is64) {
if (is_signed_bpf_cond(BPF_OP(code)))
emit_sext_32_rd(&rd, ctx);
else
emit_zext_32_rd_t1(&rd, ctx);
}
e = ctx->ninsns;
/* Adjust for extra insns */
rvoff -= ninsns_rvoff(e - s);
emit_branch(BPF_OP(code), rd, rs, rvoff, ctx);
break;
case BPF_JMP | BPF_JSET | BPF_K:
case BPF_JMP32 | BPF_JSET | BPF_K:
rvoff = rv_offset(i, off, ctx);
s = ctx->ninsns;
if (is_12b_int(imm)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_andi(RV_REG_T1, rd, imm, ctx);
} else {
emit_imm(RV_REG_T1, imm, ctx);
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_and(RV_REG_T1, rd, RV_REG_T1, ctx);
}
/* For jset32, we should clear the upper 32 bits of t1, but
* sign-extension is sufficient here and saves one instruction,
* as t1 is used only in comparison against zero.
*/
if (!is64 && imm < 0)
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_addiw(RV_REG_T1, RV_REG_T1, 0, ctx);
e = ctx->ninsns;
rvoff -= ninsns_rvoff(e - s);
emit_branch(BPF_JNE, RV_REG_T1, RV_REG_ZERO, rvoff, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
/* function call */
case BPF_JMP | BPF_CALL:
{
bool fixed;
u64 addr;
mark_call(ctx);
ret = bpf_jit_get_func_addr(ctx->prog, insn, extra_pass, &addr,
&fixed);
if (ret < 0)
return ret;
ret = emit_call(fixed, addr, ctx);
if (ret)
return ret;
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
}
/* tail call */
case BPF_JMP | BPF_TAIL_CALL:
if (emit_bpf_tail_call(i, ctx))
return -1;
break;
/* function return */
case BPF_JMP | BPF_EXIT:
if (i == ctx->prog->len - 1)
break;
rvoff = epilogue_offset(ctx);
riscv, bpf: Fix offset range checking for auipc+jalr on RV64 The existing code in emit_call on RV64 checks that the PC-relative offset to the function fits in 32 bits before calling emit_jump_and_link to emit an auipc+jalr pair. However, this check is incorrect because offsets in the range [2^31 - 2^11, 2^31 - 1] cannot be encoded using auipc+jalr on RV64 (see discussion [1]). The RISC-V spec has recently been updated to reflect this fact [2, 3]. This patch fixes the problem by moving the check on the offset into emit_jump_and_link and modifying it to the correct range of encodable offsets, which is [-2^31 - 2^11, 2^31 - 2^11). This also enforces the check on the offset to other uses of emit_jump_and_link (e.g., BPF_JA) as well. Currently, this bug is unlikely to be triggered, because the memory region from which JITed images are allocated is close enough to kernel text for the offsets to not become too large; and because the bounds on BPF program size are small enough. This patch prevents this problem from becoming an issue if either of these change. [1]: https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/bwWFhBnnZFQ [2]: https://github.com/riscv/riscv-isa-manual/commit/b1e42e09ac55116dbf9de5e4fb326a5a90e4a993 [3]: https://github.com/riscv/riscv-isa-manual/commit/4c1b2066ebd2965a422e41eb262d0a208a7fea07 Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200406221604.18547-1-luke.r.nels@gmail.com
2020-04-07 06:16:04 +08:00
ret = emit_jump_and_link(RV_REG_ZERO, rvoff, false, ctx);
if (ret)
return ret;
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
/* dst = imm64 */
case BPF_LD | BPF_IMM | BPF_DW:
{
struct bpf_insn insn1 = insn[1];
u64 imm64;
imm64 = (u64)insn1.imm << 32 | (u32)imm;
emit_imm(rd, imm64, ctx);
return 1;
}
/* LDX: dst = *(size *)(src + off) */
case BPF_LDX | BPF_MEM | BPF_B:
if (is_12b_int(off)) {
emit(rv_lbu(rd, off, rs), ctx);
break;
}
emit_imm(RV_REG_T1, off, ctx);
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_add(RV_REG_T1, RV_REG_T1, rs, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
emit(rv_lbu(rd, 0, RV_REG_T1), ctx);
if (insn_is_zext(&insn[1]))
return 1;
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
case BPF_LDX | BPF_MEM | BPF_H:
if (is_12b_int(off)) {
emit(rv_lhu(rd, off, rs), ctx);
break;
}
emit_imm(RV_REG_T1, off, ctx);
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_add(RV_REG_T1, RV_REG_T1, rs, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
emit(rv_lhu(rd, 0, RV_REG_T1), ctx);
if (insn_is_zext(&insn[1]))
return 1;
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
case BPF_LDX | BPF_MEM | BPF_W:
if (is_12b_int(off)) {
emit(rv_lwu(rd, off, rs), ctx);
break;
}
emit_imm(RV_REG_T1, off, ctx);
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_add(RV_REG_T1, RV_REG_T1, rs, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
emit(rv_lwu(rd, 0, RV_REG_T1), ctx);
if (insn_is_zext(&insn[1]))
return 1;
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
case BPF_LDX | BPF_MEM | BPF_DW:
if (is_12b_int(off)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_ld(rd, off, rs, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
}
emit_imm(RV_REG_T1, off, ctx);
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_add(RV_REG_T1, RV_REG_T1, rs, ctx);
emit_ld(rd, 0, RV_REG_T1, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
/* speculation barrier */
case BPF_ST | BPF_NOSPEC:
break;
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
/* ST: *(size *)(dst + off) = imm */
case BPF_ST | BPF_MEM | BPF_B:
emit_imm(RV_REG_T1, imm, ctx);
if (is_12b_int(off)) {
emit(rv_sb(rd, off, RV_REG_T1), ctx);
break;
}
emit_imm(RV_REG_T2, off, ctx);
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
emit(rv_sb(RV_REG_T2, 0, RV_REG_T1), ctx);
break;
case BPF_ST | BPF_MEM | BPF_H:
emit_imm(RV_REG_T1, imm, ctx);
if (is_12b_int(off)) {
emit(rv_sh(rd, off, RV_REG_T1), ctx);
break;
}
emit_imm(RV_REG_T2, off, ctx);
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
emit(rv_sh(RV_REG_T2, 0, RV_REG_T1), ctx);
break;
case BPF_ST | BPF_MEM | BPF_W:
emit_imm(RV_REG_T1, imm, ctx);
if (is_12b_int(off)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_sw(rd, off, RV_REG_T1, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
}
emit_imm(RV_REG_T2, off, ctx);
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
emit_sw(RV_REG_T2, 0, RV_REG_T1, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
case BPF_ST | BPF_MEM | BPF_DW:
emit_imm(RV_REG_T1, imm, ctx);
if (is_12b_int(off)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_sd(rd, off, RV_REG_T1, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
}
emit_imm(RV_REG_T2, off, ctx);
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
emit_sd(RV_REG_T2, 0, RV_REG_T1, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
/* STX: *(size *)(dst + off) = src */
case BPF_STX | BPF_MEM | BPF_B:
if (is_12b_int(off)) {
emit(rv_sb(rd, off, rs), ctx);
break;
}
emit_imm(RV_REG_T1, off, ctx);
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
emit(rv_sb(RV_REG_T1, 0, rs), ctx);
break;
case BPF_STX | BPF_MEM | BPF_H:
if (is_12b_int(off)) {
emit(rv_sh(rd, off, rs), ctx);
break;
}
emit_imm(RV_REG_T1, off, ctx);
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
emit(rv_sh(RV_REG_T1, 0, rs), ctx);
break;
case BPF_STX | BPF_MEM | BPF_W:
if (is_12b_int(off)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_sw(rd, off, rs, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
}
emit_imm(RV_REG_T1, off, ctx);
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
emit_sw(RV_REG_T1, 0, rs, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
case BPF_STX | BPF_MEM | BPF_DW:
if (is_12b_int(off)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_sd(rd, off, rs, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
}
emit_imm(RV_REG_T1, off, ctx);
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
emit_sd(RV_REG_T1, 0, rs, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
break;
case BPF_STX | BPF_ATOMIC | BPF_W:
case BPF_STX | BPF_ATOMIC | BPF_DW:
if (insn->imm != BPF_ADD) {
pr_err("bpf-jit: not supported: atomic operation %02x ***\n",
insn->imm);
return -EINVAL;
}
/* atomic_add: lock *(u32 *)(dst + off) += src
* atomic_add: lock *(u64 *)(dst + off) += src
*/
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
if (off) {
if (is_12b_int(off)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_addi(RV_REG_T1, rd, off, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
} else {
emit_imm(RV_REG_T1, off, ctx);
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
}
rd = RV_REG_T1;
}
emit(BPF_SIZE(code) == BPF_W ?
rv_amoadd_w(RV_REG_ZERO, rs, rd, 0, 0) :
rv_amoadd_d(RV_REG_ZERO, rs, rd, 0, 0), ctx);
break;
default:
pr_err("bpf-jit: unknown opcode %02x\n", code);
return -EINVAL;
}
return 0;
}
void bpf_jit_build_prologue(struct rv_jit_context *ctx)
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
{
int stack_adjust = 0, store_offset, bpf_stack_adjust;
bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16);
if (bpf_stack_adjust)
mark_fp(ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
if (seen_reg(RV_REG_RA, ctx))
stack_adjust += 8;
stack_adjust += 8; /* RV_REG_FP */
if (seen_reg(RV_REG_S1, ctx))
stack_adjust += 8;
if (seen_reg(RV_REG_S2, ctx))
stack_adjust += 8;
if (seen_reg(RV_REG_S3, ctx))
stack_adjust += 8;
if (seen_reg(RV_REG_S4, ctx))
stack_adjust += 8;
if (seen_reg(RV_REG_S5, ctx))
stack_adjust += 8;
if (seen_reg(RV_REG_S6, ctx))
stack_adjust += 8;
stack_adjust = round_up(stack_adjust, 16);
stack_adjust += bpf_stack_adjust;
store_offset = stack_adjust - 8;
/* First instruction is always setting the tail-call-counter
* (TCC) register. This instruction is skipped for tail calls.
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
* Force using a 4-byte (non-compressed) instruction.
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
*/
emit(rv_addi(RV_REG_TCC, RV_REG_ZERO, MAX_TAIL_CALL_CNT), ctx);
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_addi(RV_REG_SP, RV_REG_SP, -stack_adjust, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
if (seen_reg(RV_REG_RA, ctx)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_sd(RV_REG_SP, store_offset, RV_REG_RA, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
store_offset -= 8;
}
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_sd(RV_REG_SP, store_offset, RV_REG_FP, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
store_offset -= 8;
if (seen_reg(RV_REG_S1, ctx)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_sd(RV_REG_SP, store_offset, RV_REG_S1, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
store_offset -= 8;
}
if (seen_reg(RV_REG_S2, ctx)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_sd(RV_REG_SP, store_offset, RV_REG_S2, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
store_offset -= 8;
}
if (seen_reg(RV_REG_S3, ctx)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_sd(RV_REG_SP, store_offset, RV_REG_S3, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
store_offset -= 8;
}
if (seen_reg(RV_REG_S4, ctx)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_sd(RV_REG_SP, store_offset, RV_REG_S4, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
store_offset -= 8;
}
if (seen_reg(RV_REG_S5, ctx)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_sd(RV_REG_SP, store_offset, RV_REG_S5, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
store_offset -= 8;
}
if (seen_reg(RV_REG_S6, ctx)) {
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_sd(RV_REG_SP, store_offset, RV_REG_S6, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
store_offset -= 8;
}
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_addi(RV_REG_FP, RV_REG_SP, stack_adjust, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
if (bpf_stack_adjust)
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_addi(RV_REG_S5, RV_REG_SP, bpf_stack_adjust, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
/* Program contains calls and tail calls, so RV_REG_TCC need
* to be saved across calls.
*/
if (seen_tail_call(ctx) && seen_call(ctx))
bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
2020-07-21 10:52:40 +08:00
emit_mv(RV_REG_TCC_SAVED, RV_REG_TCC, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
ctx->stack_size = stack_adjust;
}
void bpf_jit_build_epilogue(struct rv_jit_context *ctx)
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
{
__build_epilogue(false, ctx);
bpf, riscv: add BPF JIT for RV64G This commit adds a BPF JIT for RV64G. The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64). At the moment the RISC-V Linux port does not support CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes. The implementation does not support "far branching" (>4KiB). Test results: # modprobe test_bpf test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed] # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier ... Summary: 761 PASSED, 507 SKIPPED, 2 FAILED Note that "test_verifier" was run with one build with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise many of the the tests that require unaligned access were skipped. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 0 No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS: # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled # ./test_verifier | grep -c 'NOTE.*unknown align' 59 The two failing test_verifier tests are: "ld_abs: vlan + abs, test 1" "ld_abs: jump around ld_abs" This is due to that "far branching" involved in those tests. All tests where done on QEMU (QEMU emulator version 3.1.50 (v3.1.0-688-g8ae951fbc106)). Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 20:41:22 +08:00
}