OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Thomas Weißschuh	809145f842	tools/nolibc: setvbuf: avoid unused parameter warnings This warning will be enabled later so avoid triggering it. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 05:17:07 +02:00
Thomas Weißschuh	6407750225	tools/nolibc: fix return type of getpagesize() It's documented as returning int which is also implemented by glibc and musl, so adopt that return type. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 05:17:07 +02:00
Thomas Weißschuh	f2f5eaefa1	tools/nolibc: drop unused variables Nobody needs it, get rid of it. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 05:17:07 +02:00
Yuan Tan	3ec38af6ee	tools/nolibc: add pipe() and pipe2() support According to manual page [1], posix spec [2] and source code like arch/mips/kernel/syscall.c, for historic reasons, the sys_pipe() syscall on some architectures has an unusual calling convention. It returns results in two registers which means there is no need for it to do verify the validity of a userspace pointer argument. Historically that used to be expensive in Linux. These days the performance advantage is negligible. Nolibc doesn't support the unusual calling convention above, luckily Linux provides a generic sys_pipe2() with an additional flags argument from 2.6.27. If flags is 0, then pipe2() is the same as pipe(). So here we use sys_pipe2() to implement the pipe(). pipe2() is also provided to allow users to use flags argument on demand. [1]: https://man7.org/linux/man-pages/man2/pipe.2.html [2]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/pipe.html Suggested-by: Zhangjin Wu <falcon@tinylab.org> Link: https://lore.kernel.org/all/20230729100401.GA4577@1wt.eu/ Signed-off-by: Yuan Tan <tanyuan@tinylab.org> Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 05:17:07 +02:00
Ryan Roberts	4893c22eb2	tools/nolibc/stdio: add setvbuf() to set buffering mode Add a minimal implementation of setvbuf(), which error checks the mode argument (as required by spec) and returns. Since nolibc never buffers output, nothing needs to be done. The kselftest framework recently added a call to setvbuf(). As a result, any tests that use the kselftest framework and nolibc cause a compiler error due to missing function. This provides an urgent fix for the problem which is preventing arm64 testing on linux-next. Example: clang --target=aarch64-linux-gnu -fintegrated-as -Werror=unknown-warning-option -Werror=ignored-optimization-argument -Werror=option-ignored -Werror=unused-command-line-argument --target=aarch64-linux-gnu -fintegrated-as -fno-asynchronous-unwind-tables -fno-ident -s -Os -nostdlib \ -include ../../../../include/nolibc/nolibc.h -I../..\ -static -ffreestanding -Wall za-fork.c build/kselftest/arm64/fp/za-fork-asm.o -o build/kselftest/arm64/fp/za-fork In file included from <built-in>:1: In file included from ./../../../../include/nolibc/nolibc.h:97: In file included from ./../../../../include/nolibc/arch.h:25: ./../../../../include/nolibc/arch-aarch64.h:178:35: warning: unknown attribute 'optimize' ignored [-Wunknown-attributes] void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __no_stack_protector _start(void) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from za-fork.c:12: ../../kselftest.h:123:2: error: call to undeclared function 'setvbuf'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] setvbuf(stdout, NULL, _IOLBF, 0); ^ ../../kselftest.h:123:24: error: use of undeclared identifier '_IOLBF' setvbuf(stdout, NULL, _IOLBF, 0); ^ 1 warning and 2 errors generated. Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Link: https://lore.kernel.org/linux-kselftest/CA+G9fYus3Z8r2cg3zLv8uH8MRrzLFVWdnor02SNr=rCz+_WGVg@mail.gmail.com/ Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:40:22 +02:00
Zhangjin Wu	c48d8af2fa	tools/nolibc: s390: shrink _start with _start_c move most of the _start operations to _start_c(), include the stackprotector initialization. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:40:22 +02:00
Zhangjin Wu	eea70cdac6	tools/nolibc: riscv: shrink _start with _start_c move most of the _start operations to _start_c(), include the stackprotector initialization. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:40:22 +02:00
Zhangjin Wu	61bd4621c0	tools/nolibc: loongarch: shrink _start with _start_c move most of the _start operations to _start_c(), include the stackprotector initialization. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:40:22 +02:00
Zhangjin Wu	431b806b9b	tools/nolibc: mips: shrink _start with _start_c move most of the _start operations to _start_c(), include the stackprotector initialization. Also clean up the instructions in delayed slots. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:40:22 +02:00
Zhangjin Wu	539287d751	tools/nolibc: x86_64: shrink _start with _start_c move most of the _start operations to _start_c(), include the stackprotector initialization. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:40:22 +02:00
Zhangjin Wu	2ab446336b	tools/nolibc: i386: shrink _start with _start_c move most of the _start operations to _start_c(), include the stackprotector initialization. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:40:22 +02:00
Zhangjin Wu	ded8af47c2	tools/nolibc: aarch64: shrink _start with _start_c move most of the _start operations to _start_c(), include the stackprotector initialization. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:40:22 +02:00
Zhangjin Wu	61f9880721	tools/nolibc: arm: shrink _start with _start_c move most of the _start operations to _start_c(), include the stackprotector initialization. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:40:22 +02:00
Zhangjin Wu	06f2a62c81	tools/nolibc: crt.h: initialize stack protector As suggested by Thomas, It is able to move the stackprotector initialization from the assembly _start to the beginning of the new _start_c(). Let's call __stack_chk_init() in _start_c() as a preparation. Suggested-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/lkml/a00284a6-54b1-498c-92aa-44997fa78403@t-8ch.de/ Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:40:22 +02:00
Zhangjin Wu	d7f16723d3	tools/nolibc: stackprotector.h: add empty __stack_chk_init for !_NOLIBC_STACKPROTECTOR Let's define an empty __stack_chk_init for the !_NOLIBC_STACKPROTECTOR branch. This allows to remove #ifdef around every call of __stack_chk_init(). Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:40:22 +02:00
Zhangjin Wu	1733675515	tools/nolibc: add new crt.h with _start_c As the environ and _auxv support added for nolibc, the assembly _start function becomes more and more complex and therefore makes the porting of nolibc to new architectures harder and harder. To simplify portability, this C version of _start_c() is added to do most of the assembly start operations in C, which reduces the complexity a lot and will eventually simplify the porting of nolibc to the new architectures. The new _start_c() only requires a stack pointer argument, it will find argc, argv, envp/environ and _auxv for us, and then call main(), finally, it exit() with main's return status. With this new _start_c(), the future new architectures only require to add very few assembly instructions. As suggested by Thomas, users may use a different signature of main (e.g. void main(void)), a _nolibc_main alias is added for main to silence the warning about potential conflicting types. As suggested by Willy, the code is carefully polished for both smaller size and better readability with local variables and the right types. Suggested-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/lkml/20230715095729.GC24086@1wt.eu/ Suggested-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/lkml/90fdd255-32f4-4caf-90ff-06456b53dac3@t-8ch.de/ Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:40:22 +02:00
Zhangjin Wu	af93807eae	tools/nolibc: remove the old sys_stat support The statx manpage [1] shows that it has been supported from Linux 4.11 and glibc 2.28, the Linux support can be checked for all of the architectures with this command: $ git grep -r statx v4.11 arch/ include/uapi/asm-generic/unistd.h \ \| grep -E "aarch64\|arm\|mips\|s390\|x86\|:include/uapi" Besides riscv and loongarch, all of the nolibc supported architectures have added sys_statx from Linux v4.11. riscv is mainlined to v4.15, loongarch is mainlined to v5.19, both of them use the generic unistd.h, so, they have added sys_statx from their first mainline versions. The current oldest stable branch is v4.14, only reserving sys_statx still preserves compatibility with all of the supported stable branches, So, let's remove the old arch related and dependent sys_stat support completely. This is friendly to the future new architecture porting. [1]: https://man7.org/linux/man-pages/man2/statx.2.html Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:40:22 +02:00
Zhangjin Wu	bff60150f7	tools/nolibc: fix up startup failures for -O0 under gcc < 11.1.0 As gcc doc [1] shows: Most optimizations are completely disabled at -O0 or if an -O level is not set on the command line, even if individual optimization flags are specified. Test result [2] shows, gcc>=11.1.0 deviates from the above description, but before gcc 11.1.0, "-O0" still forcely uses frame pointer in the _start function even if the individual optimize("omit-frame-pointer") flag is specified. The frame pointer related operations will change the stack pointer (e.g. In x86_64, an extra "push %rbp" will be inserted at the beginning of _start) and make it differs from the one we expected, as a result, break the whole startup function. To fix up this issue, as suggested by Thomas, the individual "Os" and "omit-frame-pointer" optimize flags are used together on _start function to disable frame pointer completely even if the -O0 is set on the command line. [1]: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html [2]: https://lore.kernel.org/lkml/20230714094723.140603-1-falcon@tinylab.org/ Suggested-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/lkml/34b21ba5-7b59-4b3b-9ed6-ef9a3a5e06f7@t-8ch.de/ Fixes: `7f85485896` ("tools/nolibc: make compiler and assembler agree on the section around _start") Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:40:22 +02:00
Zhangjin Wu	2023349835	tools/nolibc: arch-*.h: add missing space after ',' Fix up such errors reported by scripts/checkpatch.pl: ERROR: space required after that ',' (ctx:VxV) #148: FILE: tools/include/nolibc/arch-aarch64.h:148: +void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __no_stack_protector _start(void) ^ ERROR: space required after that ',' (ctx:VxV) #148: FILE: tools/include/nolibc/arch-aarch64.h:148: +void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __no_stack_protector _start(void) ^ Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:40:22 +02:00
Thomas Weißschuh	67d108e2a2	tools/nolibc: completely remove optional environ support In commit `52e423f5b9` ("tools/nolibc: export environ as a weak symbol on i386") and friends the asm startup logic was extended to directly populate the "environ" array. This makes it impossible for "environ" to be dropped by the linker. Therefore also drop the other logic to handle non-present "environ". Also add a testcase to validate the initialization of environ. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:40:22 +02:00
Zhangjin Wu	f4191f3d52	tools/nolibc: add rmdir() support a reverse operation of mkdir() is meaningful, add rmdir() here. required by nolibc-test to remove /proc while CONFIG_PROC_FS is not enabled. Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:38:02 +02:00
Zhangjin Wu	788aca91ab	tools/nolibc: types.h: add RB_ flags for reboot() Both glibc and musl provide RB_ flags via <sys/reboot.h> for reboot(), they don't need to include <linux/reboot.h>, let nolibc provide RB_ flags too. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:38:02 +02:00
Zhangjin Wu	4201cfce15	tools/nolibc: clean up sbrk() routine Fix up the error reported by scripts/checkpatch.pl: ERROR: do not use assignment in if condition #95: FILE: tools/include/nolibc/sys.h:95: + if ((ret = sys_brk(0)) && (sys_brk(ret + inc) == ret + inc)) Apply the new generic __sysret() to merge the SET_ERRNO() and return lines. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:38:02 +02:00
Zhangjin Wu	924e9539ae	tools/nolibc: clean up mmap() routine Do several cleanups together: - Since all supported architectures have my_syscall6() now, remove the #ifdef check. - Move the mmap() related macros to tools/include/nolibc/types.h and reuse most of them from <linux/mman.h> - Apply the new generic __sysret() to convert the calling of sys_map() to oneline code Note, since MAP_FAILED is -1 on Linux, so we can use the generic __sysret() which returns -1 upon error and still satisfy user land that checks for MAP_FAILED. Suggested-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/lkml/20230702192347.GJ16233@1wt.eu/ Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:38:02 +02:00
Zhangjin Wu	6591be4a73	tools/nolibc: __sysret: support syscalls who return a pointer No official reference states the errno range, here aligns with musl and glibc and uses [-MAX_ERRNO, -1] instead of all negative ones. - musl: src/internal/syscall_ret.c - glibc: sysdeps/unix/sysv/linux/sysdep.h The MAX_ERRNO used by musl and glibc is 4095, just like the one nolibc defined in tools/include/nolibc/errno.h. Suggested-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/lkml/ZKKdD%2Fp4UkEavru6@1wt.eu/ Suggested-by: David Laight <David.Laight@ACULAB.COM> Link: https://lore.kernel.org/linux-riscv/94dd5170929f454fbc0a10a2eb3b108d@AcuMS.aculab.com/ Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:38:02 +02:00
Zhangjin Wu	6d1970e1ef	tools/nolibc: add missing my_syscall6() for mips It is able to pass the 6th argument like the 5th argument via the stack for mips, let's add a new my_syscall6() now, see [1] for details: The mips/o32 system call convention passes arguments 5 through 8 on the user stack. Both mmap() and pselect6() require my_syscall6(). [1]: https://man7.org/linux/man-pages/man2/syscall.2.html Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:38:02 +02:00
Zhangjin Wu	8b9bdab635	tools/nolibc: arch-mips.h: shrink with _NOLIBC_SYSCALL_CLOBBERLIST my_syscall<N> share the same long clobber list, define a macro for them. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:38:02 +02:00
Zhangjin Wu	2dca615ade	tools/nolibc: arch-loongarch.h: shrink with _NOLIBC_SYSCALL_CLOBBERLIST my_syscall<N> share the same long clobber list, define a macro for them. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:38:02 +02:00
Zhangjin Wu	f09f1912e4	toolc/nolibc: arch-.h: clean up whitespaces after __asm__ replace "__asm__ volatile" with "__asm__ volatile" and insert necessary whitespace before "\" to make sure the lines are aligned. $ sed -i -e 's/__asm__ volatile ( /__asm__ volatile ( /g' tools/include/nolibc/.h Note, arch-s390.h uses post-tab instead of post-whitespaces, must avoid insert whitespace just before the tabs: $ sed -i -e 's/__asm__ volatile (\t/__asm__ volatile (\t/g' tools/include/nolibc/arch-*.h Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:38:02 +02:00
Zhangjin Wu	f134c7066c	tools/nolibc: arch-.h: fix up code indent errors More than 8 whitespaces of the code indent are replaced with "tab + whitespaces" to fix up such errors reported by scripts/checkpatch.pl: ERROR: code indent should use tabs where possible #64: FILE: tools/include/nolibc/arch-mips.h:64: +^I \$ ERROR: code indent should use tabs where possible #72: FILE: tools/include/nolibc/arch-mips.h:72: +^I "t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9" \$ This command is used: $ sed -i -e '/^\t /{s/ /\t/g}' tools/include/nolibc/arch-*.h Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-23 04:38:02 +02:00
Jiri Olsa	b733eeade4	bpf: Add pid filter support for uprobe_multi link Adding support to specify pid for uprobe_multi link and the uprobes are created only for task with given pid value. Using the consumer.filter filter callback for that, so the task gets filtered during the uprobe installation. We still need to check the task during runtime in the uprobe handler, because the handler could get executed if there's another system wide consumer on the same uprobe (thanks Oleg for the insight). Cc: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230809083440.3209381-6-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-08-21 15:51:25 -07:00
Jiri Olsa	0b779b61f6	bpf: Add cookies support for uprobe_multi link Adding support to specify cookies array for uprobe_multi link. The cookies array share indexes and length with other uprobe_multi arrays (offsets/ref_ctr_offsets). The cookies[i] value defines cookie for i-the uprobe and will be returned by bpf_get_attach_cookie helper when called from ebpf program hooked to that specific uprobe. Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230809083440.3209381-5-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-08-21 15:51:25 -07:00
Jiri Olsa	89ae89f53d	bpf: Add multi uprobe link Adding new multi uprobe link that allows to attach bpf program to multiple uprobes. Uprobes to attach are specified via new link_create uprobe_multi union: struct { __aligned_u64 path; __aligned_u64 offsets; __aligned_u64 ref_ctr_offsets; __u32 cnt; __u32 flags; } uprobe_multi; Uprobes are defined for single binary specified in path and multiple calling sites specified in offsets array with optional reference counters specified in ref_ctr_offsets array. All specified arrays have length of 'cnt'. The 'flags' supports single bit for now that marks the uprobe as return probe. Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230809083440.3209381-4-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-08-21 15:51:25 -07:00
Jiri Olsa	c5487f8d91	bpf: Switch BPF_F_KPROBE_MULTI_RETURN macro to enum Switching BPF_F_KPROBE_MULTI_RETURN macro to anonymous enum, so it'd show up in vmlinux.h. There's not functional change compared to having this as macro. Acked-by: Yafang Shao <laoar.shao@gmail.com> Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230809083440.3209381-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-08-21 15:51:25 -07:00
Jiri Olsa	a3c485a5d8	bpf: Add support for bpf_get_func_ip helper for uprobe program Adding support for bpf_get_func_ip helper for uprobe program to return probed address for both uprobe and return uprobe. We discussed this in [1] and agreed that uprobe can have special use of bpf_get_func_ip helper that differs from kprobe. The kprobe bpf_get_func_ip returns: - address of the function if probe is attach on function entry for both kprobe and return kprobe - 0 if the probe is not attach on function entry The uprobe bpf_get_func_ip returns: - address of the probe for both uprobe and return uprobe The reason for this semantic change is that kernel can't really tell if the probe user space address is function entry. The uprobe program is actually kprobe type program attached as uprobe. One of the consequences of this design is that uprobes do not have its own set of helpers, but share them with kprobes. As we need different functionality for bpf_get_func_ip helper for uprobe, I'm adding the bool value to the bpf_trace_run_ctx, so the helper can detect that it's executed in uprobe context and call specific code. The is_uprobe bool is set as true in bpf_prog_run_array_sleepable, which is currently used only for executing bpf programs in uprobe. Renaming bpf_prog_run_array_sleepable to bpf_prog_run_array_uprobe to address that it's only used for uprobes and that it sets the run_ctx.is_uprobe as suggested by Yafang Shao. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Alan Maguire <alan.maguire@oracle.com> [1] https://lore.kernel.org/bpf/CAEf4BzZ=xLVkG5eurEuvLU79wAMtwho7ReR+XJAgwhFF4M-7Cg@mail.gmail.com/ Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Viktor Malik <vmalik@redhat.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230807085956.2344866-2-jolsa@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-08-07 16:42:58 -07:00
Zhangjin Wu	2d7481eb5d	tools/nolibc: unistd.h: reorder the syscall macros Tune the macros in the using order and align most of them. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-06 12:27:53 +02:00
Zhangjin Wu	d27447bc2e	tools/nolibc: sys.h: apply __sysret() helper Use __sysret() to shrink most of the library routines to oneline code. Removed 266 lines of duplicated code. Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-06 12:27:53 +02:00
Zhangjin Wu	c8d54fa37c	tools/nolibc: unistd.h: apply __sysret() helper Use __sysret() to shrink the whole _syscall() to oneline code. Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-06 12:27:53 +02:00
Zhangjin Wu	428905da6e	tools/nolibc: sys.h: add a syscall return helper Most of the library routines share the same syscall return logic: In general, a 0 return value indicates success. A -1 return value indicates an error, and an error number is stored in errno. [1] Let's add a __sysret() helper for the above logic to simplify the coding and shrink the code lines too. Thomas suggested to use inline function instead of macro for __sysret(). Willy suggested to make __sysret() be always inline. [1]: https://man7.org/linux/man-pages/man2/syscall.2.html Suggested-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/linux-riscv/ZH1+hkhiA2+ItSvX@1wt.eu/ Suggested-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/linux-riscv/ea4e7442-7223-4211-ba29-70821e907888@t-8ch.de/ Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-06 12:27:53 +02:00
Zhangjin Wu	2f98aca8aa	tools/nolibc: fix up undeclared syscall macros with #ifdef and -ENOSYS Compiling nolibc for rv32 got such errors: nolibc/sysroot/riscv/include/sys.h: In function ‘sys_gettimeofday’: nolibc/sysroot/riscv/include/sys.h:557:21: error: ‘__NR_gettimeofday’ undeclared (first use in this function); did you mean ‘sys_gettimeofday’? 557 \| return my_syscall2(__NR_gettimeofday, tv, tz); \| ^~~~~~~~~~~~~~~~~ nolibc/sysroot/riscv/include/sys.h: In function ‘sys_lseek’: nolibc/sysroot/riscv/include/sys.h:675:21: error: ‘__NR_lseek’ undeclared (first use in this function) 675 \| return my_syscall3(__NR_lseek, fd, offset, whence); \| ^~~~~~~~~~ nolibc/sysroot/riscv/include/sys.h: In function ‘sys_wait4’: nolibc/sysroot/riscv/include/sys.h:1341:21: error: ‘__NR_wait4’ undeclared (first use in this function) 1341 \| return my_syscall4(__NR_wait4, pid, status, options, rusage); If a syscall macro is not supported by a target platform, wrap it with '#ifdef' and 'return -ENOSYS' for the '#else' branch, which lets the other syscalls work as-is and allows developers to fix up the test failures reported by nolibc-test one by one later. This wraps all of the failed syscall macros with '#ifdef' and 'return -ENOSYS' for the '#else' branch, so, all of the undeclared failures are fixed. Suggested-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/linux-riscv/5e7d2adf-e96f-41ca-a4c6-5c87a25d4c9c@app.fastmail.com/ Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-06 12:27:53 +02:00
Zhangjin Wu	ca50df3098	tools/nolibc: fix up #error compile failures with -ENOSYS Compiling nolibc for rv32 got such errors: In file included from nolibc/sysroot/riscv/include/nolibc.h:99, from nolibc/sysroot/riscv/include/errno.h:26, from nolibc/sysroot/riscv/include/stdio.h:14, from tools/testing/selftests/nolibc/nolibc-test.c:12: nolibc/sysroot/riscv/include/sys.h:946:2: error: #error Neither __NR_ppoll nor __NR_poll defined, cannot implement sys_poll() 946 \| #error Neither __NR_ppoll nor __NR_poll defined, cannot implement sys_poll() \| ^~~~~ nolibc/sysroot/riscv/include/sys.h:1062:2: error: #error None of __NR_select, __NR_pselect6, nor __NR__newselect defined, cannot implement sys_select() 1062 \| #error None of __NR_select, __NR_pselect6, nor __NR__newselect defined, cannot implement sys_select() If a syscall is not supported by a target platform, 'return -ENOSYS' is better than '#error', which lets the other syscalls work as-is and allows developers to fix up the test failures reported by nolibc-test one by one later. This converts all of the '#error' to 'return -ENOSYS', so, all of the '#error' failures are fixed. Suggested-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/linux-riscv/5e7d2adf-e96f-41ca-a4c6-5c87a25d4c9c@app.fastmail.com/ Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2023-08-06 12:27:53 +02:00
Mark Brown	51e6ac1fa4	tools include: Add some common function attributes We don't have definitions of __always_unused or __noreturn in the tools version of compiler.h, add them so we can use them in kselftests. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230728-arm64-signal-memcpy-fix-v4-3-0c1290db5d46@kernel.org Signed-off-by: Will Deacon <will@kernel.org>	2023-08-04 17:36:52 +01:00
Mark Brown	e5d51a6650	tools compiler.h: Add OPTIMIZER_HIDE_VAR() Port over the definition of OPTIMIZER_HIDE_VAR() so we can use it in kselftests. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230728-arm64-signal-memcpy-fix-v4-2-0c1290db5d46@kernel.org Signed-off-by: Will Deacon <will@kernel.org>	2023-08-04 17:36:52 +01:00
Jakub Kicinski	d07b7b32da	pull-request: bpf-next 2023-08-03 -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRdM/uy1Ege0+EN1fNar9k/UBDW4wUCZMvevwAKCRBar9k/UBDW 42Z0AP90hLZ9OmoghYAlALHLl8zqXuHCV8OeFXR5auqG+kkcCwEAx6h99vnh4zgP Tngj6Yid60o39/IZXXblhV37HfSiyQ8= =/kVE -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2023-08-03 We've added 54 non-merge commits during the last 10 day(s) which contain a total of 84 files changed, 4026 insertions(+), 562 deletions(-). The main changes are: 1) Add SO_REUSEPORT support for TC bpf_sk_assign from Lorenz Bauer, Daniel Borkmann 2) Support new insns from cpu v4 from Yonghong Song 3) Non-atomically allocate freelist during prefill from YiFei Zhu 4) Support defragmenting IPv(4\|6) packets in BPF from Daniel Xu 5) Add tracepoint to xdp attaching failure from Leon Hwang 6) struct netdev_rx_queue and xdp.h reshuffling to reduce rebuild time from Jakub Kicinski * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits) net: invert the netdevice.h vs xdp.h dependency net: move struct netdev_rx_queue out of netdevice.h eth: add missing xdp.h includes in drivers selftests/bpf: Add testcase for xdp attaching failure tracepoint bpf, xdp: Add tracepoint to xdp attaching failure selftests/bpf: fix static assert compilation issue for test_cls_*.c bpf: fix bpf_probe_read_kernel prototype mismatch riscv, bpf: Adapt bpf trampoline to optimized riscv ftrace framework libbpf: fix typos in Makefile tracing: bpf: use struct trace_entry in struct syscall_tp_t bpf, devmap: Remove unused dtab field from bpf_dtab_netdev bpf, cpumap: Remove unused cmap field from bpf_cpu_map_entry netfilter: bpf: Only define get_proto_defrag_hook() if necessary bpf: Fix an array-index-out-of-bounds issue in disasm.c net: remove duplicate INDIRECT_CALLABLE_DECLARE of udp[6]_ehashfn docs/bpf: Fix malformed documentation bpf: selftests: Add defrag selftests bpf: selftests: Support custom type and proto for client sockets bpf: selftests: Support not connecting client socket netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link ... ==================== Link: https://lore.kernel.org/r/20230803174845.825419-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-03 15:34:36 -07:00
Daniel Xu	91721c2d02	netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link This commit adds support for enabling IP defrag using pre-existing netfilter defrag support. Basically all the flag does is bump a refcnt while the link the active. Checks are also added to ensure the prog requesting defrag support is run _after_ netfilter defrag hooks. We also take care to avoid any issues w.r.t. module unloading -- while defrag is active on a link, the module is prevented from unloading. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Reviewed-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/5cff26f97e55161b7d56b09ddcf5f8888a5add1d.1689970773.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-28 16:52:08 -07:00
Stanislav Fomichev	25b5a2a190	ynl: regenerate all headers Also add support to pass topdir to ynl-regen.sh (Jakub) and call it from the makefile to update the UAPI headers. Signed-off-by: Stanislav Fomichev <sdf@google.com> Co-developed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230727163001.3952878-4-sdf@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-28 09:33:12 -07:00
Yonghong Song	1f9a1ea821	bpf: Support new sign-extension load insns Add interpreter/jit support for new sign-extension load insns which adds a new mode (BPF_MEMSX). Also add verifier support to recognize these insns and to do proper verification with new insns. In verifier, besides to deduce proper bounds for the dst_reg, probed memory access is also properly handled. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011156.3711870-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-27 18:52:33 -07:00
Lorenz Bauer	9c02bec959	bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign Currently the bpf_sk_assign helper in tc BPF context refuses SO_REUSEPORT sockets. This means we can't use the helper to steer traffic to Envoy, which configures SO_REUSEPORT on its sockets. In turn, we're blocked from removing TPROXY from our setup. The reason that bpf_sk_assign refuses such sockets is that the bpf_sk_lookup helpers don't execute SK_REUSEPORT programs. Instead, one of the reuseport sockets is selected by hash. This could cause dispatch to the "wrong" socket: sk = bpf_sk_lookup_tcp(...) // select SO_REUSEPORT by hash bpf_sk_assign(skb, sk) // SK_REUSEPORT wasn't executed Fixing this isn't as simple as invoking SK_REUSEPORT from the lookup helpers unfortunately. In the tc context, L2 headers are at the start of the skb, while SK_REUSEPORT expects L3 headers instead. Instead, we execute the SK_REUSEPORT program when the assigned socket is pulled out of the skb, further up the stack. This creates some trickiness with regards to refcounting as bpf_sk_assign will put both refcounted and RCU freed sockets in skb->sk. reuseport sockets are RCU freed. We can infer that the sk_assigned socket is RCU freed if the reuseport lookup succeeds, but convincing yourself of this fact isn't straight forward. Therefore we defensively check refcounting on the sk_assign sock even though it's probably not required in practice. Fixes: `8e368dc72e` ("bpf: Fix use of sk->sk_reuseport from sk_assign") Fixes: `cf7fbe660f` ("bpf: Add socket assign support") Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Joe Stringer <joe@cilium.io> Link: https://lore.kernel.org/bpf/CACAyw98+qycmpQzKupquhkxbvWK4OFyDuuLMBNROnfWMZxUWeA@mail.gmail.com/ Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Lorenz Bauer <lmb@isovalent.com> Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-7-7021b683cdae@isovalent.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-07-25 13:55:55 -07:00
Jakub Kicinski	59be3baa8d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-20 15:52:55 -07:00
Alan Maguire	41ee0145a4	bpf: sync tools/ uapi header with Seeing the following: Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h' ...so sync tools version missing some list_node/rb_tree fields. Fixes: `c3c510ce43` ("bpf: Add 'owner' field to bpf_{list,rb}_node") Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/r/20230719162257.20818-1-alan.maguire@oracle.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 10:13:09 -07:00
Daniel Borkmann	e420bed025	bpf: Add fd-based tcx multi-prog infra with link support This work refactors and adds a lightweight extension ("tcx") to the tc BPF ingress and egress data path side for allowing BPF program management based on fds via bpf() syscall through the newly added generic multi-prog API. The main goal behind this work which we also presented at LPC [0] last year and a recent update at LSF/MM/BPF this year [3] is to support long-awaited BPF link functionality for tc BPF programs, which allows for a model of safe ownership and program detachment. Given the rise in tc BPF users in cloud native environments, this becomes necessary to avoid hard to debug incidents either through stale leftover programs or 3rd party applications accidentally stepping on each others toes. As a recap, a BPF link represents the attachment of a BPF program to a BPF hook point. The BPF link holds a single reference to keep BPF program alive. Moreover, hook points do not reference a BPF link, only the application's fd or pinning does. A BPF link holds meta-data specific to attachment and implements operations for link creation, (atomic) BPF program update, detachment and introspection. The motivation for BPF links for tc BPF programs is multi-fold, for example: - From Meta: "It's especially important for applications that are deployed fleet-wide and that don't "control" hosts they are deployed to. If such application crashes and no one notices and does anything about that, BPF program will keep running draining resources or even just, say, dropping packets. We at FB had outages due to such permanent BPF attachment semantics. With fd-based BPF link we are getting a framework, which allows safe, auto-detachable behavior by default, unless application explicitly opts in by pinning the BPF link." [1] - From Cilium-side the tc BPF programs we attach to host-facing veth devices and phys devices build the core datapath for Kubernetes Pods, and they implement forwarding, load-balancing, policy, EDT-management, etc, within BPF. Currently there is no concept of 'safe' ownership, e.g. we've recently experienced hard-to-debug issues in a user's staging environment where another Kubernetes application using tc BPF attached to the same prio/handle of cls_bpf, accidentally wiping all Cilium-based BPF programs from underneath it. The goal is to establish a clear/safe ownership model via links which cannot accidentally be overridden. [0,2] BPF links for tc can co-exist with non-link attachments, and the semantics are in line also with XDP links: BPF links cannot replace other BPF links, BPF links cannot replace non-BPF links, non-BPF links cannot replace BPF links and lastly only non-BPF links can replace non-BPF links. In case of Cilium, this would solve mentioned issue of safe ownership model as 3rd party applications would not be able to accidentally wipe Cilium programs, even if they are not BPF link aware. Earlier attempts [4] have tried to integrate BPF links into core tc machinery to solve cls_bpf, which has been intrusive to the generic tc kernel API with extensions only specific to cls_bpf and suboptimal/complex since cls_bpf could be wiped from the qdisc also. Locking a tc BPF program in place this way, is getting into layering hacks given the two object models are vastly different. We instead implemented the tcx (tc 'express') layer which is an fd-based tc BPF attach API, so that the BPF link implementation blends in naturally similar to other link types which are fd-based and without the need for changing core tc internal APIs. BPF programs for tc can then be successively migrated from classic cls_bpf to the new tc BPF link without needing to change the program's source code, just the BPF loader mechanics for attaching is sufficient. For the current tc framework, there is no change in behavior with this change and neither does this change touch on tc core kernel APIs. The gist of this patch is that the ingress and egress hook have a lightweight, qdisc-less extension for BPF to attach its tc BPF programs, in other words, a minimal entry point for tc BPF. The name tcx has been suggested from discussion of earlier revisions of this work as a good fit, and to more easily differ between the classic cls_bpf attachment and the fd-based one. For the ingress and egress tcx points, the device holds a cache-friendly array with program pointers which is separated from control plane (slow-path) data. Earlier versions of this work used priority to determine ordering and expression of dependencies similar as with classic tc, but it was challenged that for something more future-proof a better user experience is required. Hence this resulted in the design and development of the generic attach/detach/query API for multi-progs. See prior patch with its discussion on the API design. tcx is the first user and later we plan to integrate also others, for example, one candidate is multi-prog support for XDP which would benefit and have the same 'look and feel' from API perspective. The goal with tcx is to have maximum compatibility to existing tc BPF programs, so they don't need to be rewritten specifically. Compatibility to call into classic tcf_classify() is also provided in order to allow successive migration or both to cleanly co-exist where needed given its all one logical tc layer and the tcx plus classic tc cls/act build one logical overall processing pipeline. tcx supports the simplified return codes TCX_NEXT which is non-terminating (go to next program) and terminating ones with TCX_PASS, TCX_DROP, TCX_REDIRECT. The fd-based API is behind a static key, so that when unused the code is also not entered. The struct tcx_entry's program array is currently static, but could be made dynamic if necessary at a point in future. The a/b pair swap design has been chosen so that for detachment there are no allocations which otherwise could fail. The work has been tested with tc-testing selftest suite which all passes, as well as the tc BPF tests from the BPF CI, and also with Cilium's L4LB. Thanks also to Nikolay Aleksandrov and Martin Lau for in-depth early reviews of this work. [0] https://lpc.events/event/16/contributions/1353/ [1] https://lore.kernel.org/bpf/CAEf4BzbokCJN33Nw_kg82sO=xppXnKWEncGTWCTB9vGCmLB6pw@mail.gmail.com [2] https://colocatedeventseu2023.sched.com/event/1Jo6O/tales-from-an-ebpf-programs-murder-mystery-hemanth-malla-guillaume-fournier-datadog [3] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf [4] https://lore.kernel.org/bpf/20210604063116.234316-1-memxor@gmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230719140858.13224-3-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 10:07:27 -07:00
Daniel Borkmann	053c8e1f23	bpf: Add generic attach/detach/query API for multi-progs This adds a generic layer called bpf_mprog which can be reused by different attachment layers to enable multi-program attachment and dependency resolution. In-kernel users of the bpf_mprog don't need to care about the dependency resolution internals, they can just consume it with few API calls. The initial idea of having a generic API sparked out of discussion [0] from an earlier revision of this work where tc's priority was reused and exposed via BPF uapi as a way to coordinate dependencies among tc BPF programs, similar as-is for classic tc BPF. The feedback was that priority provides a bad user experience and is hard to use [1], e.g.: I cannot help but feel that priority logic copy-paste from old tc, netfilter and friends is done because "that's how things were done in the past". [...] Priority gets exposed everywhere in uapi all the way to bpftool when it's right there for users to understand. And that's the main problem with it. The user don't want to and don't need to be aware of it, but uapi forces them to pick the priority. [...] Your cover letter [0] example proves that in real life different service pick the same priority. They simply don't know any better. Priority is an unnecessary magic that apps _have_ to pick, so they just copy-paste and everyone ends up using the same. The course of the discussion showed more and more the need for a generic, reusable API where the "same look and feel" can be applied for various other program types beyond just tc BPF, for example XDP today does not have multi- program support in kernel, but also there was interest around this API for improving management of cgroup program types. Such common multi-program management concept is useful for BPF management daemons or user space BPF applications coordinating internally about their attachments. Both from Cilium and Meta side [2], we've collected the following requirements for a generic attach/detach/query API for multi-progs which has been implemented as part of this work: - Support prog-based attach/detach and link API - Dependency directives (can also be combined): - BPF_F_{BEFORE,AFTER} with relative_{fd,id} which can be {prog,link,none} - BPF_F_ID flag as {fd,id} toggle; the rationale for id is so that user space application does not need CAP_SYS_ADMIN to retrieve foreign fds via bpf_*_get_fd_by_id() - BPF_F_LINK flag as {prog,link} toggle - If relative_{fd,id} is none, then BPF_F_BEFORE will just prepend, and BPF_F_AFTER will just append for attaching - Enforced only at attach time - BPF_F_REPLACE with replace_bpf_fd which can be prog, links have their own infra for replacing their internal prog - If no flags are set, then it's default append behavior for attaching - Internal revision counter and optionally being able to pass expected_revision - User space application can query current state with revision, and pass it along for attachment to assert current state before doing updates - Query also gets extension for link_ids array and link_attach_flags: - prog_ids are always filled with program IDs - link_ids are filled with link IDs when link was used, otherwise 0 - {prog,link}_attach_flags for holding {prog,link}-specific flags - Must be easy to integrate/reuse for in-kernel users The uapi-side changes needed for supporting bpf_mprog are rather minimal, consisting of the additions of the attachment flags, revision counter, and expanding existing union with relative_{fd,id} member. The bpf_mprog framework consists of an bpf_mprog_entry object which holds an array of bpf_mprog_fp (fast-path structure). The bpf_mprog_cp (control-path structure) is part of bpf_mprog_bundle. Both have been separated, so that fast-path gets efficient packing of bpf_prog pointers for maximum cache efficiency. Also, array has been chosen instead of linked list or other structures to remove unnecessary indirections for a fast point-to-entry in tc for BPF. The bpf_mprog_entry comes as a pair via bpf_mprog_bundle so that in case of updates the peer bpf_mprog_entry is populated and then just swapped which avoids additional allocations that could otherwise fail, for example, in detach case. bpf_mprog_{fp,cp} arrays are currently static, but they could be converted to dynamic allocation if necessary at a point in future. Locking is deferred to the in-kernel user of bpf_mprog, for example, in case of tcx which uses this API in the next patch, it piggybacks on rtnl. An extensive test suite for checking all aspects of this API for prog-based attach/detach and link API comes as BPF selftests in this series. Thanks also to Andrii Nakryiko for early API discussions wrt Meta's BPF prog management. [0] https://lore.kernel.org/bpf/20221004231143.19190-1-daniel@iogearbox.net [1] https://lore.kernel.org/bpf/CAADnVQ+gEY3FjCR=+DmjDR4gp5bOYZUFJQXj4agKFHT9CQPZBw@mail.gmail.com [2] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20230719140858.13224-2-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 10:07:27 -07:00
Magnus Karlsson	f540d44e05	selftests/xsk: add basic multi-buffer test Add the first basic multi-buffer test that sends a stream of 9K packets and validates that they are received at the other end. In order to enable sending and receiving multi-buffer packets, code that sets the MTU is introduced as well as modifications to the XDP programs so that they signal that they are multi-buffer enabled. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230719132421.584801-20-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 09:56:50 -07:00
Magnus Karlsson	17f1034dd7	selftests/xsk: transmit and receive multi-buffer packets Add the ability to send and receive packets that are larger than the size of a umem frame, using the AF_XDP /XDP multi-buffer support. There are three pieces of code that need to be changed to achieve this: the Rx path, the Tx path, and the validation logic. Both the Rx path and Tx could only deal with a single fragment per packet. The Tx path is extended with a new function called pkt_nb_frags() that can be used to retrieve the number of fragments a packet will consume. We then create these many fragments in a loop and fill the N-1 first ones to the max size limit to use the buffer space efficiently, and the Nth one with whatever data that is left. This goes on until we have filled in at the most BATCH_SIZE worth of descriptors and fragments. If we detect that the next packet would lead to BATCH_SIZE number of fragments sent being exceeded, we do not send this packet and finish the batch. This packet is instead sent in the next iteration of BATCH_SIZE fragments. For Rx, we loop over all fragments we receive as usual, but for every descriptor that we receive we call a new validation function called is_frag_valid() to validate the consistency of this fragment. The code then checks if the packet continues in the next frame. If so, it loops over the next packet and performs the same validation. once we have received the last fragment of the packet we also call the function is_pkt_valid() to validate the packet as a whole. If we get to the end of the batch and we are not at the end of the current packet, we back out the partial packet and end the loop. Once we get into the receive loop next time, we start over from the beginning of that packet. This so the code becomes simpler at the cost of some performance. The validation function is_frag_valid() checks that the sequence and packet numbers are correct at the start and end of each fragment. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230719132421.584801-19-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 09:56:50 -07:00
Maciej Fijalkowski	13ce2daa25	xsk: add new netlink attribute dedicated for ZC max frags Introduce new netlink attribute NETDEV_A_DEV_XDP_ZC_MAX_SEGS that will carry maximum fragments that underlying ZC driver is able to handle on TX side. It is going to be included in netlink response only when driver supports ZC. Any value higher than 1 implies multi-buffer ZC support on underlying device. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230719132421.584801-11-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 09:56:49 -07:00
Arnaldo Carvalho de Melo	28e898ffa0	tools include UAPI: Sync the sound/asound.h copy with the kernel sources Picking the changes from: `01dfa8e969` ("ALSA: ump: Add info flag bit for static blocks") `e375b8a045` ("ALSA: ump: Add more attributes to UMP EP and FB info") `30fc139260` ("ALSA: ump: Add ioctls to inquiry UMP EP and Block info via control API") `127ae6f6da` ("ALSA: rawmidi: Skip UMP devices at SNDRV_CTL_IOCTL_RAWMIDI_NEXT_DEVICE") `e3a8a5b726` ("ALSA: rawmidi: UMP support") `a4bb75c4f1` ("ALSA: uapi: pcm: control the filling of the silence samples for drain") That harvests some new ioctls: $ tools/perf/trace/beauty/sndrv_ctl_ioctl.sh > before.ctl $ tools/perf/trace/beauty/sndrv_pcm_ioctl.sh > before.pcm $ cp include/uapi/sound/asound.h tools/include/uapi/sound/asound.h $ tools/perf/trace/beauty/sndrv_ctl_ioctl.sh > after.ctl $ tools/perf/trace/beauty/sndrv_pcm_ioctl.sh > after.pcm $ diff -u before.ctl after.ctl --- before.ctl 2023-07-14 10:17:00.319591889 -0300 +++ after.ctl 2023-07-14 10:17:24.668248373 -0300 @@ -22,6 +22,9 @@ [0x40] = "RAWMIDI_NEXT_DEVICE", [0x41] = "RAWMIDI_INFO", [0x42] = "RAWMIDI_PREFER_SUBDEVICE", + [0x43] = "UMP_NEXT_DEVICE", + [0x44] = "UMP_ENDPOINT_INFO", + [0x45] = "UMP_BLOCK_INFO", [0xd0] = "POWER", [0xd1] = "POWER_STATE", }; $ diff -u before.pcm after.pcm $ Now those will be decoded when they appear, see a system wide 'perf trace' session example here: # perf trace -e ioctl --max-events=10 0.000 ( 0.010 ms): gnome-shell/2240 ioctl(fd: 9, cmd: DRM_MODE_RMFB, arg: 0x7ffc0041d54c) = 0 2.444 ( 0.005 ms): wireplumber/2304 ioctl(fd: 47, cmd: TIOCOUTQ, arg: 0x7f16e9afea24) = 0 2.452 ( 0.002 ms): wireplumber/2304 ioctl(fd: 47, cmd: TIOCOUTQ, arg: 0x7f16e9afea24) = 0 11.348 ( 0.010 ms): gnome-shell/2240 ioctl(fd: 14, cmd: DRM_I915_IRQ_WAIT, arg: 0x7ffc0041ccf0) = 0 11.406 ( 0.037 ms): gnome-shel:cs0/2259 ioctl(fd: 14, cmd: DRM_I915_IRQ_EMIT, arg: 0x7f3cf69fdc60) = 0 11.476 ( 0.009 ms): gnome-shell/2240 ioctl(fd: 9, cmd: DRM_MODE_ADDFB2, arg: 0x7ffc0041ce50) = 0 11.497 ( 0.019 ms): gnome-shell/2240 ioctl(fd: 9, cmd: DRM_MODE_ATOMIC, arg: 0x7ffc0041cdf0) = 0 12.481 ( 0.020 ms): firefox:cs0/3651 ioctl(fd: 40, cmd: DRM_I915_IRQ_EMIT, arg: 0x7f1c365fea60) = 0 12.529 ( 0.009 ms): firefox:cs0/3651 ioctl(fd: 40, cmd: DRM_I915_IRQ_EMIT, arg: 0x7f1c365feab0) = 0 12.624 ( 0.018 ms): firefox:cs0/3651 ioctl(fd: 40, cmd: DRM_I915_IRQ_EMIT, arg: 0x7f1c365fea30) = 0 # Silencing these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/sound/asound.h include/uapi/sound/asound.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Takashi Iwai <tiwai@suse.de> Link: https://lore.kernel.org/lkml/ZLFOrTE2+xZBgHGe@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-07-14 10:33:50 -03:00
Arnaldo Carvalho de Melo	7b86159355	tools include UAPI: Sync linux/vhost.h with the kernel sources To get the changes in: `228a27cf78` ("vhost: Allow worker switching while work is queueing") `c1ecd8e950` ("vhost: allow userspace to create workers") To pick up these changes and support them: $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > before $ cp include/uapi/linux/vhost.h tools/include/uapi/linux/vhost.h $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > after $ diff -u before after --- before 2023-07-14 09:58:14.268249807 -0300 +++ after 2023-07-14 09:58:23.041493892 -0300 @@ -10,6 +10,7 @@ [0x12] = "SET_VRING_BASE", [0x13] = "SET_VRING_ENDIAN", [0x14] = "GET_VRING_ENDIAN", + [0x15] = "ATTACH_VRING_WORKER", [0x20] = "SET_VRING_KICK", [0x21] = "SET_VRING_CALL", [0x22] = "SET_VRING_ERR", @@ -31,10 +32,12 @@ [0x7C] = "VDPA_SET_GROUP_ASID", [0x7D] = "VDPA_SUSPEND", [0x7E] = "VDPA_RESUME", + [0x9] = "FREE_WORKER", }; static const char *vhost_virtio_ioctl_read_cmds[] = { [0x00] = "GET_FEATURES", [0x12] = "GET_VRING_BASE", + [0x16] = "GET_VRING_WORKER", [0x26] = "GET_BACKEND_FEATURES", [0x70] = "VDPA_GET_DEVICE_ID", [0x71] = "VDPA_GET_STATUS", @@ -44,6 +47,7 @@ [0x79] = "VDPA_GET_CONFIG_SIZE", [0x7A] = "VDPA_GET_AS_NUM", [0x7B] = "VDPA_GET_VRING_GROUP", + [0x8] = "NEW_WORKER", [0x80] = "VDPA_GET_VQS_COUNT", [0x81] = "VDPA_GET_GROUP_NUM", }; $ For instance, see how those 'cmd' ioctl arguments get translated, now ATTACH_VRING_WORKER, GET_VRING_WORKER and NEW_WORKER, will be as well: # perf trace -a -e ioctl --max-events=10 0.000 ( 0.011 ms): pipewire/2261 ioctl(fd: 60, cmd: SNDRV_PCM_HWSYNC, arg: 0x1) = 0 21.353 ( 0.014 ms): pipewire/2261 ioctl(fd: 60, cmd: SNDRV_PCM_HWSYNC, arg: 0x1) = 0 25.766 ( 0.014 ms): gnome-shell/2196 ioctl(fd: 14, cmd: DRM_I915_IRQ_WAIT, arg: 0x7ffe4a22c740) = 0 25.845 ( 0.034 ms): gnome-shel:cs0/2212 ioctl(fd: 14, cmd: DRM_I915_IRQ_EMIT, arg: 0x7fd43915dc70) = 0 25.916 ( 0.011 ms): gnome-shell/2196 ioctl(fd: 9, cmd: DRM_MODE_ADDFB2, arg: 0x7ffe4a22c8a0) = 0 25.941 ( 0.025 ms): gnome-shell/2196 ioctl(fd: 9, cmd: DRM_MODE_ATOMIC, arg: 0x7ffe4a22c840) = 0 32.915 ( 0.009 ms): gnome-shell/2196 ioctl(fd: 9, cmd: DRM_MODE_RMFB, arg: 0x7ffe4a22cf9c) = 0 42.522 ( 0.013 ms): gnome-shell/2196 ioctl(fd: 14, cmd: DRM_I915_IRQ_WAIT, arg: 0x7ffe4a22c740) = 0 42.579 ( 0.031 ms): gnome-shel:cs0/2212 ioctl(fd: 14, cmd: DRM_I915_IRQ_EMIT, arg: 0x7fd43915dc70) = 0 42.644 ( 0.010 ms): gnome-shell/2196 ioctl(fd: 9, cmd: DRM_MODE_ADDFB2, arg: 0x7ffe4a22c8a0) = 0 # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Mike Christie <michael.christie@oracle.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZLFJ%2FRsDGYiaH5nj@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-07-14 10:16:03 -03:00
Jakub Kicinski	d2afa89f66	for-netdev -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmSwqwoACgkQ6rmadz2v bTqOHRAAn+fzTLqUqsveFQcxOkie5MPHxKoOTjG4+yFR7rzPkU6Mn5RX3w5yFzSn RqutwykF9OgipAzC3QXv4pRJuq6Gia5nvwUSDP4CX273ljyeF54DK7HfopE1+YrK HXyBWZvVvMZP6q7qQyQ3qtbHZSjs5XP/M6YBlJ5zo/BTLFCyvbSDP14YKEqcBkWG ld72ElXFxlnr/zEfRjzBCfMlbmgeHLO0SiHS/9827zEmNP1AAH5/ETA7/rJ7yCJs QNQUIoJWob8xm5FMJ6CU/+sOqXR1CY053meGJFFBX5pvVD/CLRhrwHn0IMCyQqmh wKR5waeXhpl/CKNeFuxXVMNFiXbqBb/0LYJaJtrMysjMLTsQ9X7NkrDBa/+kYGyZ +ghGlaMQvPqUGg0rLH2nl9JNB8Ne/8prLMsAKUWnPuOo+Q03j054gnqhGeNtDd5b gpSk+7x93PlhGcegBV1Wk8dkiGC5V9nTVNxg40XQUCs4k9L/8Vjc35Tjqx7nBTNH DiFD24DDKUZacw9L6nEqvLF/N2fiRjtUZnVPC0yn/annyBcfX1s+ZH2Tu1F6Qk38 QMfBCnt12exmsiDoxdzzGJtjHnS/k5fsaKjlR21mOyMrIH7ipltr5UHHrdr1hBP6 24uSeTImvQQKDi+9IuXN127jZDOupKqVS6csrA0ZXrlKWh2HR+U= =GVUB -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2023-07-13 We've added 67 non-merge commits during the last 15 day(s) which contain a total of 106 files changed, 4444 insertions(+), 619 deletions(-). The main changes are: 1) Fix bpftool build in presence of stale vmlinux.h, from Alexander Lobakin. 2) Introduce bpf_me_mcache_free_rcu() and fix OOM under stress, from Alexei Starovoitov. 3) Teach verifier actual bounds of bpf_get_smp_processor_id() and fix perf+libbpf issue related to custom section handling, from Andrii Nakryiko. 4) Introduce bpf map element count, from Anton Protopopov. 5) Check skb ownership against full socket, from Kui-Feng Lee. 6) Support for up to 12 arguments in BPF trampoline, from Menglong Dong. 7) Export rcu_request_urgent_qs_task, from Paul E. McKenney. 8) Fix BTF walking of unions, from Yafang Shao. 9) Extend link_info for kprobe_multi and perf_event links, from Yafang Shao. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (67 commits) selftests/bpf: Add selftest for PTR_UNTRUSTED bpf: Fix an error in verifying a field in a union selftests/bpf: Add selftests for nested_trust bpf: Fix an error around PTR_UNTRUSTED selftests/bpf: add testcase for TRACING with 6+ arguments bpf, x86: allow function arguments up to 12 for TRACING bpf, x86: save/restore regs with BPF_DW size bpftool: Use "fallthrough;" keyword instead of comments bpf: Add object leak check. bpf: Convert bpf_cpumask to bpf_mem_cache_free_rcu. bpf: Introduce bpf_mem_free_rcu() similar to kfree_rcu(). selftests/bpf: Improve test coverage of bpf_mem_alloc. rcu: Export rcu_request_urgent_qs_task() bpf: Allow reuse from waiting_for_gp_ttrace list. bpf: Add a hint to allocated objects. bpf: Change bpf_mem_cache draining process. bpf: Further refactor alloc_bulk(). bpf: Factor out inc/dec of active flag into helpers. bpf: Refactor alloc_bulk(). bpf: Let free_all() return the number of freed elements. ... ==================== Link: https://lore.kernel.org/r/20230714020910.80794-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-13 19:13:24 -07:00
Yafang Shao	1b715e1b0e	bpf: Support ->fill_link_info for perf_event By introducing support for ->fill_link_info to the perf_event link, users gain the ability to inspect it using `bpftool link show`. While the current approach involves accessing this information via `bpftool perf show`, consolidating link information for all link types in one place offers greater convenience. Additionally, this patch extends support to the generic perf event, which is not currently accommodated by `bpftool perf show`. While only the perf type and config are exposed to userspace, other attributes such as sample_period and sample_freq are ignored. It's important to note that if kptr_restrict is not permitted, the probed address will not be exposed, maintaining security measures. A new enum bpf_perf_event_type is introduced to help the user understand which struct is relevant. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230709025630.3735-9-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-11 20:07:51 -07:00
Yafang Shao	7ac8d0d261	bpf: Support ->fill_link_info for kprobe_multi With the addition of support for fill_link_info to the kprobe_multi link, users will gain the ability to inspect it conveniently using the `bpftool link show`. This enhancement provides valuable information to the user, including the count of probed functions and their respective addresses. It's important to note that if the kptr_restrict setting is not permitted, the probed address will not be exposed, ensuring security. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230709025630.3735-2-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-11 20:07:50 -07:00
Arnaldo Carvalho de Melo	ad07149f34	tools headers UAPI: Sync linux/prctl.h with the kernel sources To pick the changes in: `1fd96a3e9d` ("riscv: Add prctl controls for userspace vector management") That adds some RISC-V specific prctl options: $ tools/perf/trace/beauty/prctl_option.sh > before $ cp include/uapi/linux/prctl.h tools/include/uapi/linux/prctl.h $ tools/perf/trace/beauty/prctl_option.sh > after $ diff -u before after --- before 2023-07-11 13:22:01.928705942 -0300 +++ after 2023-07-11 13:22:36.342645970 -0300 @@ -63,6 +63,8 @@ [66] = "GET_MDWE", [67] = "SET_MEMORY_MERGE", [68] = "GET_MEMORY_MERGE", + [69] = "RISCV_V_SET_CONTROL", + [70] = "RISCV_V_GET_CONTROL", }; static const char *prctl_set_mm_options[] = { [1] = "START_CODE", $ That now will be used to decode the syscall option and also to compose filters, for instance: [root@five ~]# perf trace -e syscalls:sys_enter_prctl --filter option==SET_NAME 0.000 Isolated Servi/3474327 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23f13b7aee) 0.032 DOM Worker/3474327 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23deb25670) 7.920 :3474328/3474328 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24fbb10) 7.935 StreamT~s #374/3474328 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24fb970) 8.400 Isolated Servi/3474329 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24bab10) 8.418 StreamT~s #374/3474329 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24ba970) ^C[root@five ~]# This addresses this perf build warning: Warning: Kernel ABI header differences: diff -u tools/include/uapi/linux/prctl.h include/uapi/linux/prctl.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Chiu <andy.chiu@sifive.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/lkml/ZK2DhOB6JJKu2A7M@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-07-11 13:30:40 -03:00
Arnaldo Carvalho de Melo	920b91d927	tools include UAPI: Sync linux/mount.h copy with the kernel sources To pick the changes from: `6ac3928156` ("fs: allow to mount beneath top mount") That, after a fix to the move_mount_flags.sh script, harvests the new MOVE_MOUNT_BENEATH move_mount flag: $ tools/perf/trace/beauty/move_mount_flags.sh > before $ cp include/uapi/linux/mount.h tools/include/uapi/linux/mount.h $ tools/perf/trace/beauty/move_mount_flags.sh > after $ $ diff -u before after --- before 2023-07-11 12:38:49.244886707 -0300 +++ after 2023-07-11 12:51:15.125255940 -0300 @@ -6,4 +6,5 @@ [ilog2(0x00000020) + 1] = "T_AUTOMOUNTS", [ilog2(0x00000040) + 1] = "T_EMPTY_PATH", [ilog2(0x00000100) + 1] = "SET_GROUP", + [ilog2(0x00000200) + 1] = "BENEATH", }; $ That will then be properly decoded when used in tools like: # perf trace -e move_mount This addresses this perf build warning: Warning: Kernel ABI header differences: diff -u tools/include/uapi/linux/mount.h include/uapi/linux/mount.h Cc: Christian Brauner <brauner@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZK17kifP%2FiYl+Hcc@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-07-11 13:01:23 -03:00
Arnaldo Carvalho de Melo	225bbf44bf	tools headers UAPI: Sync linux/kvm.h with the kernel sources To pick the changes in: `89d01306e3` ("RISC-V: KVM: Implement device interface for AIA irqchip") `22725266bd` ("KVM: Fix comment for KVM_ENABLE_CAP") `2f440b72e8` ("KVM: arm64: Add KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE") That just rebuilds perf, as these patches don't add any new KVM ioctl to be harvested for the the 'perf trace' ioctl syscall argument beautifiers. This addresses this perf build warning: Warning: Kernel ABI header differences: diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Anup Patel <apatel@ventanamicro.com> Cc: Binbin Wu <binbin.wu@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Ricardo Koller <ricarkol@google.com> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/lkml/ZK12+virXMIXMysy@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-07-11 12:36:38 -03:00
Arnaldo Carvalho de Melo	48fa42c945	tools headers uapi: Sync linux/fcntl.h with the kernel sources To get the changes in: `96b2b072ee` ("exportfs: allow exporting non-decodeable file handles to userspace") That don't add anything that is handled by existing hard coded tables or table generation scripts. This silences this perf build warning: Warning: Kernel ABI header differences: diff -u tools/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZK11P5AwRBUxxutI@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-07-11 12:29:23 -03:00
Arnaldo Carvalho de Melo	9350a91791	tools headers UAPI: Sync files changed by new cachestat syscall with the kernel sources To pick the changes in these csets: `cf264e1329` ("cachestat: implement cachestat syscall") That add support for this new syscall in tools such as 'perf trace'. For instance, this is now possible: # perf trace -e cachestat ^C[root@five ~]# # perf trace -v -e cachestat Using CPUID AuthenticAMD-25-21-0 event qualifier tracepoint filter: (common_pid != 3163687 && common_pid != 3147) && (id == 451) mmap size 528384B ^C[root@five ~] # perf trace -v -e stat --max-events=10 Using CPUID AuthenticAMD-25-21-0 event qualifier tracepoint filter: (common_pid != 3163713 && common_pid != 3147) && (id == 4 \|\| id == 5 \|\| id == 6 \|\| id == 136 \|\| id == 137 \|\| id == 138 \|\| id == 262 \|\| id == 332 \|\| id == 451) mmap size 528384B 0.000 ( 0.009 ms): Cache2 I/O/4544 statfs(pathname: 0x45635288, buf: 0x7f8745725b60) = 0 0.012 ( 0.003 ms): Cache2 I/O/4544 newfstatat(dfd: CWD, filename: 0x45635288, statbuf: 0x7f874569d250) = 0 0.036 ( 0.002 ms): Cache2 I/O/4544 newfstatat(dfd: 138, filename: 0x541b7093, statbuf: 0x7f87457256f0, flag: 4096) = 0 0.372 ( 0.006 ms): Cache2 I/O/4544 statfs(pathname: 0x45635288, buf: 0x7f8745725b10) = 0 0.379 ( 0.003 ms): Cache2 I/O/4544 newfstatat(dfd: CWD, filename: 0x45635288, statbuf: 0x7f874569d250) = 0 0.390 ( 0.002 ms): Cache2 I/O/4544 newfstatat(dfd: 138, filename: 0x541b7093, statbuf: 0x7f87457256a0, flag: 4096) = 0 0.609 ( 0.005 ms): Cache2 I/O/4544 statfs(pathname: 0x45635288, buf: 0x7f8745725b60) = 0 0.615 ( 0.003 ms): Cache2 I/O/4544 newfstatat(dfd: CWD, filename: 0x45635288, statbuf: 0x7f874569d250) = 0 0.625 ( 0.002 ms): Cache2 I/O/4544 newfstatat(dfd: 138, filename: 0x541b7093, statbuf: 0x7f87457256f0, flag: 4096) = 0 0.826 ( 0.005 ms): Cache2 I/O/4544 statfs(pathname: 0x45635288, buf: 0x7f8745725b10) = 0 # That is the filter expression attached to the raw_syscalls:sys_{enter,exit} tracepoints. $ find tools/perf/arch/ -name "syscall*tbl" \| xargs grep -w sys_cachestat tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl:451 n64 cachestat sys_cachestat tools/perf/arch/powerpc/entry/syscalls/syscall.tbl:451 common cachestat sys_cachestat tools/perf/arch/s390/entry/syscalls/syscall.tbl:451 common cachestat sys_cachestat sys_cachestat tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:451 common cachestat sys_cachestat $ $ grep -w cachestat /tmp/build/perf-tools/arch/x86/include/generated/asm/syscalls_64.c [451] = "cachestat", $ This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h diff -u tools/include/uapi/linux/mman.h include/uapi/linux/mman.h diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Link: https://lore.kernel.org/lkml/ZK1pVBJpbjujJNJW@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-07-11 11:41:15 -03:00
Arnaldo Carvalho de Melo	142256d2f4	tools headers UAPI: Sync drm/i915_drm.h with the kernel sources `81b1b599df` ("drm/i915: Allow user to set cache at BO creation") `98d2722a85` ("drm/i915/huc: differentiate the 2 steps of the MTL HuC auth flow") `bc4be0a38b` ("drm/i915/pmu: Prepare for multi-tile non-engine counters") `d1da138f24` ("drm/i915/uapi/pxp: Add a GET_PARAM for PXP") That adds some ioctls but use the __I915_PMU_OTHER() macro, not supported yet in the tools/perf/trace/beauty/drm_ioctl.sh conversion script. This silences this perf build warning: Warning: Kernel ABI header differences: diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Fei Yang <fei.yang@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://lore.kernel.org/lkml/ZK1R%2FIyWcUKYQbQV@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-07-11 11:41:15 -03:00
Linus Torvalds	7b82e90411	asm-generic updates for 6.5 These are cleanups for architecture specific header files: - the comments in include/linux/syscalls.h have gone out of sync and are really pointless, so these get removed - The asm/bitsperlong.h header no longer needs to be architecture specific on modern compilers, so use a generic version for newer architectures that use new enough userspace compilers - A cleanup for virt_to_pfn/virt_to_bus to have proper type checking, forcing the use of pointers -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmSl138ACgkQYKtH/8kJ UieqWxAA2WjNVfyuieYckglOVE0PZPs2fzCwyzTY5iUTH3gE5cBFWJDWcg2EnouG v3X3htEQcowYWaCF9+rypQXaGiSx4WXi2Bjxnz3D/BcreqWPI4eSQ0fpGG5SURTY 2zYF72GTt4JGR++l+7/R9MZwPbwYDT9BsD5tkel8PxnyVLM6/c5xFvbjzRSKFE8x SMN1jGZ62ITLNf/8coAOEPNxBYtDT6yQyu7P2sx5cd65LAQq9yLKjFklnBBovgWT OoCIZAdGkhcNwOh1LjyHcdNdpfNJGceKyqKPqty07IhCQuF2jxiyFYFzuBbeyQfE S0itN8o/MIfUmxaQl3e8dPAVb1RlNVr1zfQ6y4tUtWNdkNL2WwSnSQSRHrBfHxCQ QCF++PMeFcLhGwMYtqdNJ7XGLQ0PsjD74pRf0vo+vjmqDk2BJsJBP57VU+8MJn5r SoxqnJ0WxLvm1TfrNKusV7zMNWquc2duJDW40zsOssP4itjYELSI6qa56qmzlqmX zKmRx6mxAlx9RRK8FHXFYHbz3p93vv8z9vTOZV3AjIjjED960CLknUAwCC8FoJyz 9b5wyMXsLQHQjGt8luAvPc6OiU0EiU9a4SPK+feWcv27serFvnjJlRTS/yG2Z3zd BYsUgsXHypsdoud+aE7MeCy7fE8n3mhoyMQQRBkOMFJ7RsG6wAE= =S/he -----END PGP SIGNATURE----- Merge tag 'asm-generic-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "These are cleanups for architecture specific header files: - the comments in include/linux/syscalls.h have gone out of sync and are really pointless, so these get removed - The asm/bitsperlong.h header no longer needs to be architecture specific on modern compilers, so use a generic version for newer architectures that use new enough userspace compilers - A cleanup for virt_to_pfn/virt_to_bus to have proper type checking, forcing the use of pointers" * tag 'asm-generic-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: syscalls: Remove file path comments from headers tools arch: Remove uapi bitsperlong.h of hexagon and microblaze asm-generic: Unify uapi bitsperlong.h for arm64, riscv and loongarch m68k/mm: Make pfn accessors static inlines arm64: memory: Make virt_to_pfn() a static inline ARM: mm: Make virt_to_pfn() a static inline asm-generic/page.h: Make pfn accessors static inlines xen/netback: Pass (void *) to virt_to_page() netfs: Pass a pointer to virt_to_page() cifs: Pass a pointer to virt_to_page() in cifsglob cifs: Pass a pointer to virt_to_page() riscv: mm: init: Pass a pointer to virt_to_page() ARC: init: Pass a pointer to virt_to_pfn() in init m68k: Pass a pointer to virt_to_pfn() virt_to_page() fs/proc/kcore.c: Pass a pointer to virt_addr_valid()	2023-07-06 10:06:04 -07:00
Linus Torvalds	3a8a670eee	Networking changes for 6.5. Core ---- - Rework the sendpage & splice implementations. Instead of feeding data into sockets page by page extend sendmsg handlers to support taking a reference on the data, controlled by a new flag called MSG_SPLICE_PAGES. Rework the handling of unexpected-end-of-file to invoke an additional callback instead of trying to predict what the right combination of MORE/NOTLAST flags is. Remove the MSG_SENDPAGE_NOTLAST flag completely. - Implement SCM_PIDFD, a new type of CMSG type analogous to SCM_CREDENTIALS, but it contains pidfd instead of plain pid. - Enable socket busy polling with CONFIG_RT. - Improve reliability and efficiency of reporting for ref_tracker. - Auto-generate a user space C library for various Netlink families. Protocols --------- - Allow TCP to shrink the advertised window when necessary, prevent sk_rcvbuf auto-tuning from growing the window all the way up to tcp_rmem[2]. - Use per-VMA locking for "page-flipping" TCP receive zerocopy. - Prepare TCP for device-to-device data transfers, by making sure that payloads are always attached to skbs as page frags. - Make the backoff time for the first N TCP SYN retransmissions linear. Exponential backoff is unnecessarily conservative. - Create a new MPTCP getsockopt to retrieve all info (MPTCP_FULL_INFO). - Avoid waking up applications using TLS sockets until we have a full record. - Allow using kernel memory for protocol ioctl callbacks, paving the way to issuing ioctls over io_uring. - Add nolocalbypass option to VxLAN, forcing packets to be fully encapsulated even if they are destined for a local IP address. - Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure in-kernel ECMP implementation (e.g. Open vSwitch) select the same link for all packets. Support L4 symmetric hashing in Open vSwitch. - PPPoE: make number of hash bits configurable. - Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client (ipconfig). - Add layer 2 miss indication and filtering, allowing higher layers (e.g. ACL filters) to make forwarding decisions based on whether packet matched forwarding state in lower devices (bridge). - Support matching on Connectivity Fault Management (CFM) packets. - Hide the "link becomes ready" IPv6 messages by demoting their printk level to debug. - HSR: don't enable promiscuous mode if device offloads the proto. - Support active scanning in IEEE 802.15.4. - Continue work on Multi-Link Operation for WiFi 7. BPF --- - Add precision propagation for subprogs and callbacks. This allows maintaining verification efficiency when subprograms are used, or in fact passing the verifier at all for complex programs, especially those using open-coded iterators. - Improve BPF's {g,s}setsockopt() length handling. Previously BPF assumed the length is always equal to the amount of written data. But some protos allow passing a NULL buffer to discover what the output buffer should be, without writing anything. - Accept dynptr memory as memory arguments passed to helpers. - Add routing table ID to bpf_fib_lookup BPF helper. - Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands. - Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark maps as read-only). - Show target_{obj,btf}_id in tracing link fdinfo. - Addition of several new kfuncs (most of the names are self-explanatory): - Add a set of new dynptr kfuncs: bpf_dynptr_adjust(), bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size() and bpf_dynptr_clone(). - bpf_task_under_cgroup() - bpf_sock_destroy() - force closing sockets - bpf_cpumask_first_and(), rework bpf_cpumask_any() kfuncs Netfilter --------- - Relax set/map validation checks in nf_tables. Allow checking presence of an entry in a map without using the value. - Increase ip_vs_conn_tab_bits range for 64BIT builds. - Allow updating size of a set. - Improve NAT tuple selection when connection is closing. Driver API ---------- - Integrate netdev with LED subsystem, to allow configuring HW "offloaded" blinking of LEDs based on link state and activity (i.e. packets coming in and out). - Support configuring rate selection pins of SFP modules. - Factor Clause 73 auto-negotiation code out of the drivers, provide common helper routines. - Add more fool-proof helpers for managing lifetime of MDIO devices associated with the PCS layer. - Allow drivers to report advanced statistics related to Time Aware scheduler offload (taprio). - Allow opting out of VF statistics in link dump, to allow more VFs to fit into the message. - Split devlink instance and devlink port operations. New hardware / drivers ---------------------- - Ethernet: - Synopsys EMAC4 IP support (stmmac) - Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches - Marvell 88E6250 7 port switches - Microchip LAN8650/1 Rev.B0 PHYs - MediaTek MT7981/MT7988 built-in 1GE PHY driver - WiFi: - Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps - Realtek RTL8723DS (SDIO variant) - Realtek RTL8851BE - CAN: - Fintek F81604 Drivers ------- - Ethernet NICs: - Intel (100G, ice): - support dynamic interrupt allocation - use meta data match instead of VF MAC addr on slow-path - nVidia/Mellanox: - extend link aggregation to handle 4, rather than just 2 ports - spawn sub-functions without any features by default - OcteonTX2: - support HTB (Tx scheduling/QoS) offload - make RSS hash generation configurable - support selecting Rx queue using TC filters - Wangxun (ngbe/txgbe): - add basic Tx/Rx packet offloads - add phylink support (SFP/PCS control) - Freescale/NXP (enetc): - report TAPRIO packet statistics - Solarflare/AMD: - support matching on IP ToS and UDP source port of outer header - VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6 - add devlink dev info support for EF10 - Virtual NICs: - Microsoft vNIC: - size the Rx indirection table based on requested configuration - support VLAN tagging - Amazon vNIC: - try to reuse Rx buffers if not fully consumed, useful for ARM servers running with 16kB pages - Google vNIC: - support TCP segmentation of >64kB frames - Ethernet embedded switches: - Marvell (mv88e6xxx): - enable USXGMII (88E6191X) - Microchip: - lan966x: add support for Egress Stage 0 ACL engine - lan966x: support mapping packet priority to internal switch priority (based on PCP or DSCP) - Ethernet PHYs: - Broadcom PHYs: - support for Wake-on-LAN for BCM54210E/B50212E - report LPI counter - Microsemi PHYs: support RGMII delay configuration (VSC85xx) - Micrel PHYs: receive timestamp in the frame (LAN8841) - Realtek PHYs: support optional external PHY clock - Altera TSE PCS: merge the driver into Lynx PCS which it is a variant of - CAN: Kvaser PCIEcan: - support packet timestamping - WiFi: - Intel (iwlwifi): - major update for new firmware and Multi-Link Operation (MLO) - configuration rework to drop test devices and split the different families - support for segmented PNVM images and power tables - new vendor entries for PPAG (platform antenna gain) feature - Qualcomm 802.11ax (ath11k): - Multiple Basic Service Set Identifier (MBSSID) and Enhanced MBSSID Advertisement (EMA) support in AP mode - support factory test mode - RealTek (rtw89): - add RSSI based antenna diversity - support U-NII-4 channels on 5 GHz band - RealTek (rtl8xxxu): - AP mode support for 8188f - support USB RX aggregation for the newer chips Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmSbJM4ACgkQMUZtbf5S IrtoDhAAhEim1+LBIKf4lhPcVdZ2p/TkpnwTz5jsTwSeRBAxTwuNJ2fQhFXg13E3 MnRq6QaEp8G4/tA/gynLvQop+FEZEnv+horP0zf/XLcC8euU7UrKdrpt/4xxdP07 IL/fFWsoUGNO+L9LNaHwBo8g7nHvOkPscHEBHc2Xrvzab56TJk6vPySfLqcpKlNZ CHWDwTpgRqNZzSKiSpoMVd9OVMKUXcPYHpDmfEJ5l+e8vTXmZzOLHrSELHU5nP5f mHV7gxkDCTshoGcaed7UTiOvgu1p6E5EchDJxiLaSUbgsd8SZ3u4oXwRxgj33RK/ fB2+UaLrRt/DdlHvT/Ph8e8Ygu77yIXMjT49jsfur/zVA0HEA2dFb7V6QlsYRmQp J25pnrdXmE15llgqsC0/UOW5J1laTjII+T2T70UOAqQl4LWYAQDG4WwsAqTzU0KY dueydDouTp9XC2WYrRUEQxJUzxaOaazskDUHc5c8oHp/zVBT+djdgtvVR9+gi6+7 yy4elI77FlEEqL0ItdU/lSWINayAlPLsIHkMyhSGKX0XDpKjeycPqkNx4UterXB/ JKIR5RBWllRft+igIngIkKX0tJGMU0whngiw7d1WLw25wgu4sB53hiWWoSba14hv tXMxwZs5iGaPcT38oRVMZz8I1kJM4Dz3SyI7twVvi4RUut64EG4= =9i4I -----END PGP SIGNATURE----- Merge tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking changes from Jakub Kicinski: "WiFi 7 and sendpage changes are the biggest pieces of work for this release. The latter will definitely require fixes but I think that we got it to a reasonable point. Core: - Rework the sendpage & splice implementations Instead of feeding data into sockets page by page extend sendmsg handlers to support taking a reference on the data, controlled by a new flag called MSG_SPLICE_PAGES Rework the handling of unexpected-end-of-file to invoke an additional callback instead of trying to predict what the right combination of MORE/NOTLAST flags is Remove the MSG_SENDPAGE_NOTLAST flag completely - Implement SCM_PIDFD, a new type of CMSG type analogous to SCM_CREDENTIALS, but it contains pidfd instead of plain pid - Enable socket busy polling with CONFIG_RT - Improve reliability and efficiency of reporting for ref_tracker - Auto-generate a user space C library for various Netlink families Protocols: - Allow TCP to shrink the advertised window when necessary, prevent sk_rcvbuf auto-tuning from growing the window all the way up to tcp_rmem[2] - Use per-VMA locking for "page-flipping" TCP receive zerocopy - Prepare TCP for device-to-device data transfers, by making sure that payloads are always attached to skbs as page frags - Make the backoff time for the first N TCP SYN retransmissions linear. Exponential backoff is unnecessarily conservative - Create a new MPTCP getsockopt to retrieve all info (MPTCP_FULL_INFO) - Avoid waking up applications using TLS sockets until we have a full record - Allow using kernel memory for protocol ioctl callbacks, paving the way to issuing ioctls over io_uring - Add nolocalbypass option to VxLAN, forcing packets to be fully encapsulated even if they are destined for a local IP address - Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure in-kernel ECMP implementation (e.g. Open vSwitch) select the same link for all packets. Support L4 symmetric hashing in Open vSwitch - PPPoE: make number of hash bits configurable - Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client (ipconfig) - Add layer 2 miss indication and filtering, allowing higher layers (e.g. ACL filters) to make forwarding decisions based on whether packet matched forwarding state in lower devices (bridge) - Support matching on Connectivity Fault Management (CFM) packets - Hide the "link becomes ready" IPv6 messages by demoting their printk level to debug - HSR: don't enable promiscuous mode if device offloads the proto - Support active scanning in IEEE 802.15.4 - Continue work on Multi-Link Operation for WiFi 7 BPF: - Add precision propagation for subprogs and callbacks. This allows maintaining verification efficiency when subprograms are used, or in fact passing the verifier at all for complex programs, especially those using open-coded iterators - Improve BPF's {g,s}setsockopt() length handling. Previously BPF assumed the length is always equal to the amount of written data. But some protos allow passing a NULL buffer to discover what the output buffer should* be, without writing anything - Accept dynptr memory as memory arguments passed to helpers - Add routing table ID to bpf_fib_lookup BPF helper - Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands - Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark maps as read-only) - Show target_{obj,btf}_id in tracing link fdinfo - Addition of several new kfuncs (most of the names are self-explanatory): - Add a set of new dynptr kfuncs: bpf_dynptr_adjust(), bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size() and bpf_dynptr_clone(). - bpf_task_under_cgroup() - bpf_sock_destroy() - force closing sockets - bpf_cpumask_first_and(), rework bpf_cpumask_any() kfuncs Netfilter: - Relax set/map validation checks in nf_tables. Allow checking presence of an entry in a map without using the value - Increase ip_vs_conn_tab_bits range for 64BIT builds - Allow updating size of a set - Improve NAT tuple selection when connection is closing Driver API: - Integrate netdev with LED subsystem, to allow configuring HW "offloaded" blinking of LEDs based on link state and activity (i.e. packets coming in and out) - Support configuring rate selection pins of SFP modules - Factor Clause 73 auto-negotiation code out of the drivers, provide common helper routines - Add more fool-proof helpers for managing lifetime of MDIO devices associated with the PCS layer - Allow drivers to report advanced statistics related to Time Aware scheduler offload (taprio) - Allow opting out of VF statistics in link dump, to allow more VFs to fit into the message - Split devlink instance and devlink port operations New hardware / drivers: - Ethernet: - Synopsys EMAC4 IP support (stmmac) - Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches - Marvell 88E6250 7 port switches - Microchip LAN8650/1 Rev.B0 PHYs - MediaTek MT7981/MT7988 built-in 1GE PHY driver - WiFi: - Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps - Realtek RTL8723DS (SDIO variant) - Realtek RTL8851BE - CAN: - Fintek F81604 Drivers: - Ethernet NICs: - Intel (100G, ice): - support dynamic interrupt allocation - use meta data match instead of VF MAC addr on slow-path - nVidia/Mellanox: - extend link aggregation to handle 4, rather than just 2 ports - spawn sub-functions without any features by default - OcteonTX2: - support HTB (Tx scheduling/QoS) offload - make RSS hash generation configurable - support selecting Rx queue using TC filters - Wangxun (ngbe/txgbe): - add basic Tx/Rx packet offloads - add phylink support (SFP/PCS control) - Freescale/NXP (enetc): - report TAPRIO packet statistics - Solarflare/AMD: - support matching on IP ToS and UDP source port of outer header - VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6 - add devlink dev info support for EF10 - Virtual NICs: - Microsoft vNIC: - size the Rx indirection table based on requested configuration - support VLAN tagging - Amazon vNIC: - try to reuse Rx buffers if not fully consumed, useful for ARM servers running with 16kB pages - Google vNIC: - support TCP segmentation of >64kB frames - Ethernet embedded switches: - Marvell (mv88e6xxx): - enable USXGMII (88E6191X) - Microchip: - lan966x: add support for Egress Stage 0 ACL engine - lan966x: support mapping packet priority to internal switch priority (based on PCP or DSCP) - Ethernet PHYs: - Broadcom PHYs: - support for Wake-on-LAN for BCM54210E/B50212E - report LPI counter - Microsemi PHYs: support RGMII delay configuration (VSC85xx) - Micrel PHYs: receive timestamp in the frame (LAN8841) - Realtek PHYs: support optional external PHY clock - Altera TSE PCS: merge the driver into Lynx PCS which it is a variant of - CAN: Kvaser PCIEcan: - support packet timestamping - WiFi: - Intel (iwlwifi): - major update for new firmware and Multi-Link Operation (MLO) - configuration rework to drop test devices and split the different families - support for segmented PNVM images and power tables - new vendor entries for PPAG (platform antenna gain) feature - Qualcomm 802.11ax (ath11k): - Multiple Basic Service Set Identifier (MBSSID) and Enhanced MBSSID Advertisement (EMA) support in AP mode - support factory test mode - RealTek (rtw89): - add RSSI based antenna diversity - support U-NII-4 channels on 5 GHz band - RealTek (rtl8xxxu): - AP mode support for 8188f - support USB RX aggregation for the newer chips" tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1602 commits) net: scm: introduce and use scm_recv_unix helper af_unix: Skip SCM_PIDFD if scm->pid is NULL. net: lan743x: Simplify comparison netlink: Add __sock_i_ino() for __netlink_diag_dump(). net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses Revert "af_unix: Call scm_recv() only after scm_set_cred()." phylink: ReST-ify the phylink_pcs_neg_mode() kdoc libceph: Partially revert changes to support MSG_SPLICE_PAGES net: phy: mscc: fix packet loss due to RGMII delays net: mana: use vmalloc_array and vcalloc net: enetc: use vmalloc_array and vcalloc ionic: use vmalloc_array and vcalloc pds_core: use vmalloc_array and vcalloc gve: use vmalloc_array and vcalloc octeon_ep: use vmalloc_array and vcalloc net: usb: qmi_wwan: add u-blox 0x1312 composition perf trace: fix MSG_SPLICE_PAGES build error ipvlan: Fix return value of ipvlan_queue_xmit() netfilter: nf_tables: fix underflow in chain reference counter netfilter: nf_tables: unbind non-anonymous set if rule construction fails ...	2023-06-28 16:43:10 -07:00
Linus Torvalds	b19edac599	nolibc updates for v6.5 o Add stackprotector support. o Fix RISC-V load-store instruction syntax to support 32-bit binaries, plus fixes for generic 32-bit support. o Fix use of s390 sys_fork(). o Add my_syscall6() for ARM. o Support different platforms having different errno definitions. o Fix ppoll/ppoll_time64 arguments (add the fifth argument). o Force use of little endian on MIPS. o Improved testing, for example, better handling of different compilers and compiler versions, comparing nolibc behavior to that of libc, and additional test cases. o Improve syntax and header ordering. o Use existing <linux/reboot.h> instead of redefining constants. o Add syscall(). -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmSUv78THHBhdWxtY2tA a2VybmVsLm9yZwAKCRCevxLzctn7jIUUD/9kEmF9BgeXerrsOx3omRgt7DbZR1Kj UYEI0mzydqQc92ZuS87VsM56rOA5SUDUDCPgTkkPCpo1anqo22+9FfFLU6M7EoEJ CNISlLtb7S1MdM9hND0RlxKoHxthcVcpUThVzAGMmuTNukJudwVBr085iiOS20VO taR5oHbPEE6pMmhbEsurmUHwTaqeCnuSZqmUoHnzatOidRByZDL7/mPr8y+lhtwo MP0wkS9ie6OTs7shH2/tt683ZY/v/JZnoOmokl7YxN6vsWeTxX7H3W4jdSGrPqW5 H+OaMVZV5QPG3EFN6MhvdMSAGWLXohMtMuSLc/BACwJ8u073LvJgJHoBahiVPXn7 y0bJbZbnXvkpp+Hqxh4argarwtQum3KAUrNLO/vIWSjJN1HbT0rhc1sRhAM+cta8 3a2nSsGf4xW8ToHgg2Q9PNzJSHxtIX1LxSEboS0IyRSYsdUS9E8gxugVIfyH9Rle gyasoSjepqwLVz6JnWiFIizHLPpEc22a3wSoRm6MzRKFaY+f8+KW6si7GgmSNmdA LJk5tid+2Unjz7BhXJ14XHRBpHYdQRQ4uA42EcUSc1CFc4/0rodGJ0hi03SXDGWY dH11x/yKW54lWZqyYUA/KAcJm8jwCFIWfGRvY9DHrA0Sh5aEyeNH3Brx1iITqnht svgWtwUsBJYIMQ== =Is+H -----END PGP SIGNATURE----- Merge tag 'nolibc.2023.06.22a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull nolibc updates from Paul McKenney: - Add stackprotector support - Fix RISC-V load-store instruction syntax to support 32-bit binaries, plus fixes for generic 32-bit support - Fix use of s390 sys_fork() - Add my_syscall6() for ARM - Support different platforms having different errno definitions - Fix ppoll/ppoll_time64 arguments (add the fifth argument) - Force use of little endian on MIPS - Improved testing, for example, better handling of different compilers and compiler versions, comparing nolibc behavior to that of libc, and additional test cases - Improve syntax and header ordering - Use existing <linux/reboot.h> instead of redefining constants - Add syscall() * tag 'nolibc.2023.06.22a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (53 commits) selftests/nolibc: make sure gcc always use little endian on MIPS selftests/nolibc: also count skipped and failed tests in output selftests/nolibc: add new gettimeofday test cases selftests/nolibc: remove gettimeofday_bad1/2 completely selftests/nolibc: support two errnos with EXPECT_SYSER2() tools/nolibc: open: fix up compile warning for arm tools/nolibc: arm: add missing my_syscall6 selftests/nolibc: use INT_MAX instead of __INT_MAX__ selftests/nolibc: not include limits.h for nolibc selftests/nolibc: fix up compile warning with glibc on x86_64 selftests/nolibc: allow specify extra arguments for qemu selftests/nolibc: remove test gettimeofday_null tools/nolibc: ensure fast64 integer types have 64 bits selftests/nolibc: test_fork: fix up duplicated print tools/nolibc: ppoll/ppoll_time64: add a missing argument selftests/nolibc: remove the duplicated gettimeofday_bad2 selftests/nolibc: print name instead of number for EOVERFLOW tools/nolibc: support nanoseconds in stat() selftests/nolibc: prevent coredumps during test execution tools/nolibc: add support for prctl() ...	2023-06-27 10:56:41 -07:00
Jakub Kicinski	a685d0df75	bpf-next-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZJX+ygAKCRDbK58LschI g0/2AQDHg12smf9mPfK9wOFDNRIIX8r2iufB8LUFQMzCwltN6gEAkAdkAyfbof7P TMaNUiHABijAFtChxoSI35j3OOSRrwE= =GJgN -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-06-23 We've added 49 non-merge commits during the last 24 day(s) which contain a total of 70 files changed, 1935 insertions(+), 442 deletions(-). The main changes are: 1) Extend bpf_fib_lookup helper to allow passing the route table ID, from Louis DeLosSantos. 2) Fix regsafe() in verifier to call check_ids() for scalar registers, from Eduard Zingerman. 3) Extend the set of cpumask kfuncs with bpf_cpumask_first_and() and a rework of bpf_cpumask_any() kfuncs. Additionally, add selftests, from David Vernet. 4) Fix socket lookup BPF helpers for tc/XDP to respect VRF bindings, from Gilad Sever. 5) Change bpf_link_put() to use workqueue unconditionally to fix it under PREEMPT_RT, from Sebastian Andrzej Siewior. 6) Follow-ups to address issues in the bpf_refcount shared ownership implementation, from Dave Marchevsky. 7) A few general refactorings to BPF map and program creation permissions checks which were part of the BPF token series, from Andrii Nakryiko. 8) Various fixes for benchmark framework and add a new benchmark for BPF memory allocator to BPF selftests, from Hou Tao. 9) Documentation improvements around iterators and trusted pointers, from Anton Protopopov. 10) Small cleanup in verifier to improve allocated object check, from Daniel T. Lee. 11) Improve performance of bpf_xdp_pointer() by avoiding access to shared_info when XDP packet does not have frags, from Jesper Dangaard Brouer. 12) Silence a harmless syzbot-reported warning in btf_type_id_size(), from Yonghong Song. 13) Remove duplicate bpfilter_umh_cleanup in favor of umd_cleanup_helper, from Jarkko Sakkinen. 14) Fix BPF selftests build for resolve_btfids under custom HOSTCFLAGS, from Viktor Malik. tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (49 commits) bpf, docs: Document existing macros instead of deprecated bpf, docs: BPF Iterator Document selftests/bpf: Fix compilation failure for prog vrf_socket_lookup selftests/bpf: Add vrf_socket_lookup tests bpf: Fix bpf socket lookup from tc/xdp to respect socket VRF bindings bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC hookpoint bpf: Factor out socket lookup functions for the TC hookpoint. selftests/bpf: Set the default value of consumer_cnt as 0 selftests/bpf: Ensure that next_cpu() returns a valid CPU number selftests/bpf: Output the correct error code for pthread APIs selftests/bpf: Use producer_cnt to allocate local counter array xsk: Remove unused inline function xsk_buff_discard() bpf: Keep BPF_PROG_LOAD permission checks clear of validations bpf: Centralize permissions checks for all BPF map types bpf: Inline map creation logic in map_create() function bpf: Move unprivileged checks into map_create() and bpf_prog_load() bpf: Remove in_atomic() from bpf_link_put(). selftests/bpf: Verify that check_ids() is used for scalars in regsafe() bpf: Verify scalar ids mapping in regsafe() using check_ids() selftests/bpf: Check if mark_chain_precision() follows scalar ids ... ==================== Link: https://lore.kernel.org/r/20230623211256.8409-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 14:52:28 -07:00
Sohil Mehta	4dd595c34c	syscalls: Remove file path comments from headers Source file locations for syscall definitions can change over a period of time. File paths in comments get stale and are hard to maintain long term. Also, their usefulness is questionable since it would be easier to locate a syscall definition using the SYSCALL_DEFINEx() macro. Remove all source file path comments from the syscall headers. Also, equalize the uneven line spacing (some of which is introduced due to the deletions). Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2023-06-22 17:10:09 +02:00
Tiezhu Yang	8386f58f8d	asm-generic: Unify uapi bitsperlong.h for arm64, riscv and loongarch Now we specify the minimal version of GCC as 5.1 and Clang/LLVM as 11.0.0 in Documentation/process/changes.rst, __CHAR_BIT__ and __SIZEOF_LONG__ are usable, it is probably fine to unify the definition of __BITS_PER_LONG as (__CHAR_BIT__ * __SIZEOF_LONG__) in asm-generic uapi bitsperlong.h. In order to keep safe and avoid regression, only unify uapi bitsperlong.h for some archs such as arm64, riscv and loongarch which are using newer toolchains that have the definitions of __CHAR_BIT__ and __SIZEOF_LONG__. Suggested-by: Xi Ruoyao <xry111@xry111.site> Link: https://lore.kernel.org/all/d3e255e4746de44c9903c4433616d44ffcf18d1b.camel@xry111.site/ Suggested-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/linux-arch/a3a4f48a-07d4-4ed9-bc53-5d383428bdd2@app.fastmail.com/ Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2023-06-22 17:04:36 +02:00
Alexander Mikhalitsyn	7b26952a91	net: core: add getsockopt SO_PEERPIDFD Add SO_PEERPIDFD which allows to get pidfd of peer socket holder pidfd. This thing is direct analog of SO_PEERCRED which allows to get plain PID. Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: David Ahern <dsahern@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Luca Boccassi <bluca@debian.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Stanislav Fomichev <sdf@google.com> Cc: bpf@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-arch@vger.kernel.org Reviewed-by: Christian Brauner <brauner@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Tested-by: Luca Boccassi <bluca@debian.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-06-12 10:45:50 +01:00
Alexander Mikhalitsyn	5e2ff6704a	scm: add SO_PASSPIDFD and SCM_PIDFD Implement SCM_PIDFD, a new type of CMSG type analogical to SCM_CREDENTIALS, but it contains pidfd instead of plain pid, which allows programmers not to care about PID reuse problem. We mask SO_PASSPIDFD feature if CONFIG_UNIX is not builtin because it depends on a pidfd_prepare() API which is not exported to the kernel modules. Idea comes from UAPI kernel group: https://uapi-group.org/kernel-features/ Big thanks to Christian Brauner and Lennart Poettering for productive discussions about this. Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: David Ahern <dsahern@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Luca Boccassi <bluca@debian.org> Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-arch@vger.kernel.org Tested-by: Luca Boccassi <bluca@debian.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-06-12 10:45:49 +01:00
Zhangjin Wu	f62ec079d0	tools/nolibc: open: fix up compile warning for arm In function ‘open’: nolibc/sysroot/arm/include/sys.h:919:23: warning: ‘mode_t’ {aka ‘short unsigned int’} is promoted to ‘int’ when passed through ‘...’ 919 \| mode = va_arg(args, mode_t); \| ^ nolibc/sysroot/arm/include/sys.h:919:23: note: (so you should pass ‘int’ not ‘mode_t’ {aka ‘short unsigned int’} to ‘va_arg’) nolibc/sysroot/arm/include/sys.h:919:23: note: if this code is reached, the program will abort Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:10 -07:00
Zhangjin Wu	646ff7c7ed	tools/nolibc: arm: add missing my_syscall6 This is required by the coming removal of the oldselect and newselect support. pselect6/pselect6_time64 will be used unconditionally, they have 6 arguments. Suggested-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/linux-riscv/bf3e07c1-75f5-425b-9124-f3f2b230e63a@app.fastmail.com/ Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:09 -07:00
Zhangjin Wu	bd27fef329	selftests/nolibc: not include limits.h for nolibc When compile nolibc-test.c with 2.31 glibc, we got such error: In file included from /usr/riscv64-linux-gnu/include/sys/cdefs.h:452, from /usr/riscv64-linux-gnu/include/features.h:461, from /usr/riscv64-linux-gnu/include/bits/libc-header-start.h:33, from /usr/riscv64-linux-gnu/include/limits.h:26, from /usr/lib/gcc-cross/riscv64-linux-gnu/9/include/limits.h:194, from /usr/lib/gcc-cross/riscv64-linux-gnu/9/include/syslimits.h:7, from /usr/lib/gcc-cross/riscv64-linux-gnu/9/include/limits.h:34, from /labs/linux-lab/src/linux-stable/tools/testing/selftests/nolibc/nolibc-test.c:6: /usr/riscv64-linux-gnu/include/bits/wordsize.h:28:3: error: #error "rv32i-based targets are not supported" 28 \| # error "rv32i-based targets are not supported" Glibc (>= 2.33) commit 5b6113d62efa ("RISC-V: Support the 32-bit ABI implementation") fixed up above error. As suggested by Thomas, defining INT_MIN/INT_MAX for nolibc can remove the including of limits.h, and therefore no above error. of course, the other libcs still require limits.h, move it to the right place. The LONG_MIN/LONG_MAX are also defined too. Suggested-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/linux-riscv/09d60dc2-e298-4c22-8e2f-8375861bd9be@t-8ch.de/ Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:09 -07:00
Thomas Weißschuh	f9bf5944d3	tools/nolibc: ensure fast64 integer types have 64 bits On 32bit platforms size_t is not enough to represent [u]int_fast64_t. Fixes: `3e9fd4e9a1` ("tools/nolibc: add integer types and integer limit macros") Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:09 -07:00
Zhangjin Wu	0dd2fdbfa5	tools/nolibc: ppoll/ppoll_time64: add a missing argument The ppoll and ppoll_time64 syscalls have 5 arguments, but we only provide 4, align with kernel and add the missing sigsetsize argument. Because the sigmask is NULL, the last sigsetsize argument is ignored, keep it as 0 here is safe enough. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:09 -07:00
Thomas Weißschuh	87b9fa66af	tools/nolibc: support nanoseconds in stat() Keep backwards compatibility through unions. The compatibility macros like #define st_atime st_atim.tv_sec as documented in stat(3type) don't work for nolibc because it would break with other stat-like structures that contain the field st_atime. The stx_atime, stx_mtime, stx_ctime are in type of 'struct statx_timestamp', which is incompatible with 'struct timespec', should be converted explicitly. /* include/uapi/linux/stat.h / struct statx_timestamp { __s64 tv_sec; __u32 tv_nsec; __s32 __reserved; }; / include/uapi/linux/time.h / struct timespec { __kernel_old_time_t tv_sec; / seconds / long tv_nsec; / nanoseconds */ }; Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/linux-riscv/3a3edd48-1ace-4c89-89e8-9c594dd1b3c9@t-8ch.de/ Co-authored-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> [wt: squashed Zhangjin & Thomas' patches into one to preserve "bisectability"] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:09 -07:00
Thomas Weißschuh	208aa9d94c	tools/nolibc: add support for prctl() It will be used to disable core dumps from the child spawned to validate the stack protector functionality. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:09 -07:00
Thomas Weißschuh	79d8d4cad2	tools/nolibc: s390: disable stackprotector in _start s390 does not support the "global" stack protector mode that is implemented in nolibc. Now that nolibc detects if stack protectors are enabled at runtime it could happen that a future compiler does indeed use global mode on and nolibc would compile but segfault at runtime. To avoid this hypothetic case and to align s390 with the other architectures disable stack protectors when compiling _start(). Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:09 -07:00
Thomas Weißschuh	e76b70dec9	tools/nolibc: fix segfaults on compilers without attribute no_stack_protector Not all compilers, notably GCC < 10, have support for __attribute__((no_stack_protector)). Fall back to a mechanism that also works there. Tested with GCC 9.5.0 from kernel.org crosstools. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:08 -07:00
Thomas Weißschuh	818924d129	tools/nolibc: add autodetection for stackprotector support The stackprotector support in nolibc should be enabled iff it is also enabled in the compiler. Use the preprocessor defines added by gcc and clang if stackprotector support is enable to automatically do so in nolibc. This completely removes the need for any user-visible API. To avoid inlining the lengthy preprocessor check into every user introduce a new header compiler.h that abstracts the logic away. As the define NOLIBC_STACKPROTECTOR is now not user-relevant anymore prefix it with an underscore. Suggested-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/lkml/20230520133237.GA27501@1wt.eu/ Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:08 -07:00
Thomas Weißschuh	e21a2eef74	tools/nolibc: reformat list of headers to be installed This makes it easier to add and remove more entries in the future without creating spurious diff hunks. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:08 -07:00
Thomas Weißschuh	88fc7eb54e	tools/nolibc: ensure stack protector guard is never zero The all-zero pattern is one of the more probable out-of-bound writes so add a special case to not accidentally accept it. Also it enables the reliable detection of stack protector initialization during testing. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:08 -07:00
Thomas Weißschuh	7a9b234520	tools/nolibc: x86_64: disable stack protector for _start This was forgotten in the original submission. It is unknown why it worked for x86_64 on some compiler without this attribute. Reported-by: Willy Tarreau <w@1wt.eu> Closes: https://lore.kernel.org/lkml/20230520133237.GA27501@1wt.eu/ Fixes: `0d8c461adb` ("tools/nolibc: x86_64: add stackprotector support") Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:08 -07:00
Thomas Weißschuh	659ee30f33	tools/nolibc: fix typo pint -> point Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:08 -07:00
Thomas Weißschuh	56d294a50c	tools/nolibc: riscv: add stackprotector support Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:08 -07:00
Thomas Weißschuh	3da0de377b	tools/nolibc: mips: add stackprotector support Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:08 -07:00
Thomas Weißschuh	ca2d043714	tools/nolibc: loongarch: add stackprotector support Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:08 -07:00
Thomas Weißschuh	ed6c0d89bb	tools/nolibc: arm: add stackprotector support Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:08 -07:00
Thomas Weißschuh	c1e30f7d38	tools/nolibc: aarch64: add stackprotector support Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:08 -07:00
Thomas Weißschuh	53fcfafa8c	tools/nolibc/unistd: add syscall() syscall() is used by "normal" libcs to allow users to directly call syscalls. By having the same syntax inside nolibc users can more easily write code that works with different libcs. The macro logic is adapted from systemtaps STAP_PROBEV() macro that is released in the public domain / CC0. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:07 -07:00
Zhangjin Wu	c22c7c81af	tools/nolibc: riscv: Fix up load/store instructions for rv32 When compile nolibc application for rv32, we got such errors: nolibc/sysroot/riscv/include/arch.h:190: Error: unrecognized opcode `ld a4,0(a3)' nolibc/sysroot/riscv/include/arch.h:194: Error: unrecognized opcode `sd a3,%lo(_auxv)(a4)' nolibc/sysroot/riscv/include/arch.h:196: Error: unrecognized opcode `sd a2,%lo(environ)(a3)' Refer to arch/riscv/include/asm/asm.h and add REG_L/REG_S macros here to let rv32 uses its own lw/sw instructions. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:07 -07:00
Thomas Weißschuh	72ffbc6784	tools/nolibc: remove LINUX_REBOOT_ constants The same constants and some more have been exposed to userspace via linux/reboot.h for a long time. To avoid conflicts and trim down nolibc a bit drop the custom definitions. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:07 -07:00
Thomas Weißschuh	404fa87c0e	tools/nolibc: s390: provide custom implementation for sys_fork On s390 the first two arguments to the clone() syscall are swapped, as documented in clone(2). Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Cc: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:07 -07:00
Thomas Weißschuh	fddc8f81f1	tools/nolibc: use C89 comment syntax Most of nolibc is already using C89 comments. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:07 -07:00
Thomas Weißschuh	0738c2d7bf	tools/nolibc: use __inline__ syntax When building in strict C89 mode the "inline" keyword is unknown. While "__inline__" is non-standard it is used by the kernel headers themselves. So the used compilers would have to support it or the users shim it with a #define. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:07 -07:00
Thomas Weißschuh	7f291cfa90	tools/nolibc: use standard __asm__ statements Most of the code was migrated to C99-conformant __asm__ statements before. It seems string.h was missed. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-06-09 11:46:07 -07:00

1 2 3 4 5 ...

1851 Commits