OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Toke Høiland-Jørgensen	4f33ddb4e3	libbpf: Propagate EPERM to caller on program load When loading an eBPF program, libbpf overrides the return code for EPERM errors instead of returning it to the caller. This makes it hard to figure out what went wrong on load. In particular, EPERM is returned when the system rlimit is too low to lock the memory required for the BPF program. Previously, this was somewhat obscured because the rlimit error would be hit on map creation (which does return it correctly). However, since maps can now be reused, object load can proceed all the way to loading programs without hitting the error; propagating it even in this case makes it possible for the caller to react appropriately (and, e.g., attempt to raise the rlimit before retrying). Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/157333184946.88376.11768171652794234561.stgit@toke.dk	2019-11-10 19:26:30 -08:00
Toke Høiland-Jørgensen	9c4e395a1e	selftests/bpf: Add tests for automatic map unpinning on load failure This add tests for the different variations of automatic map unpinning on load failure. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/157333184838.88376.8243704248624814775.stgit@toke.dk	2019-11-10 19:26:30 -08:00
Toke Høiland-Jørgensen	ec6d5f47bf	libbpf: Unpin auto-pinned maps if loading fails Since the automatic map-pinning happens during load, it will leave pinned maps around if the load fails at a later stage. Fix this by unpinning any pinned maps on cleanup. To avoid unpinning pinned maps that were reused rather than newly pinned, add a new boolean property on struct bpf_map to keep track of whether that map was reused or not; and only unpin those maps that were not reused. Fixes: `57a00f4164` ("libbpf: Add auto-pinning of maps when loading BPF objects") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/157333184731.88376.9992935027056165873.stgit@toke.dk	2019-11-10 19:26:30 -08:00
Daniel T. Lee	451d1dc886	samples: bpf: update map definition to new syntax BTF-defined map Since, the new syntax of BTF-defined map has been introduced, the syntax for using maps under samples directory are mixed up. For example, some are already using the new syntax, and some are using existing syntax by calling them as 'legacy'. As stated at commit `abd29c9314` ("libbpf: allow specifying map definitions using BTF"), the BTF-defined map has more compatablility with extending supported map definition features. The commit doesn't replace all of the map to new BTF-defined map, because some of the samples still use bpf_load instead of libbpf, which can't properly create BTF-defined map. This will only updates the samples which uses libbpf API for loading bpf program. (ex. bpf_prog_load_xattr) Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-11-08 19:21:52 -08:00
Daniel T. Lee	afbe3c27d9	samples: bpf: Update outdated error message Currently, under samples, several methods are being used to load bpf program. Since using libbpf is preferred solution, lots of previously used 'load_bpf_file' from bpf_load are replaced with 'bpf_prog_load_xattr' from libbpf. But some of the error messages still show up as 'load_bpf_file' instead of 'bpf_prog_load_xattr'. This commit fixes outdated errror messages under samples and fixes some code style issues. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191107005153.31541-2-danieltimlee@gmail.com	2019-11-08 19:19:43 -08:00
Martin KaFai Lau	ed5941af3f	bpf: Add cb access in kfree_skb test Access the skb->cb[] in the kfree_skb test. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191107180905.4097871-1-kafai@fb.com	2019-11-07 10:59:08 -08:00
Martin KaFai Lau	7e3617a72d	bpf: Add array support to btf_struct_access This patch adds array support to btf_struct_access(). It supports array of int, array of struct and multidimensional array. It also allows using u8[] as a scratch space. For example, it allows access the "char cb[48]" with size larger than the array's element "char". Another potential use case is "u64 icsk_ca_priv[]" in the tcp congestion control. btf_resolve_size() is added to resolve the size of any type. It will follow the modifier if there is any. Please see the function comment for details. This patch also adds the "off < moff" check at the beginning of the for loop. It is to reject cases when "off" is pointing to a "hole" in a struct. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191107180903.4097702-1-kafai@fb.com	2019-11-07 10:59:08 -08:00
Daniel Borkmann	30ee348c12	Merge branch 'bpf-libbpf-fixes' Andrii Nakryiko says: ==================== Github's mirror of libbpf got LGTM and Coverity statis analysis running against it and spotted few real bugs and few potential issues. This patch series fixes found issues. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-11-07 16:20:41 +01:00
Andrii Nakryiko	98e527af30	libbpf: Improve handling of corrupted ELF during map initialization If we get ELF file with "maps" section, but no symbols pointing to it, we'll end up with division by zero. Add check against this situation and exit early with error. Found by Coverity scan against Github libbpf sources. Fixes: `bf82927125` ("libbpf: refactor map initialization") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191107020855.3834758-6-andriin@fb.com	2019-11-07 16:20:38 +01:00
Andrii Nakryiko	994021a7e0	libbpf: Make btf__resolve_size logic always check size error condition Perform size check always in btf__resolve_size. Makes the logic a bit more robust against corrupted BTF and silences LGTM/Coverity complaining about always true (size < 0) check. Fixes: `69eaab04c6` ("btf: extract BTF type size calculation") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191107020855.3834758-5-andriin@fb.com	2019-11-07 16:20:38 +01:00
Andrii Nakryiko	dd3ab12637	libbpf: Fix another potential overflow issue in bpf_prog_linfo Fix few issues found by Coverity and LGTM. Fixes: `b053b439b7` ("bpf: libbpf: bpftool: Print bpf_line_info during prog dump") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191107020855.3834758-4-andriin@fb.com	2019-11-07 16:20:38 +01:00
Andrii Nakryiko	4ee1135615	libbpf: Fix potential overflow issue Fix a potential overflow issue found by LGTM analysis, based on Github libbpf source code. Fixes: `3d65014146` ("bpf: libbpf: Add btf_line_info support to libbpf") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191107020855.3834758-3-andriin@fb.com	2019-11-07 16:20:37 +01:00
Andrii Nakryiko	3dc5e05982	libbpf: Fix memory leak/double free issue Coverity scan against Github libbpf code found the issue of not freeing memory and leaving already freed memory still referenced from bpf_program. Fix it by re-assigning successfully reallocated memory sooner. Fixes: `2993e0515b` ("tools/bpf: add support to read .BTF.ext sections") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191107020855.3834758-2-andriin@fb.com	2019-11-07 16:20:37 +01:00
Andrii Nakryiko	9656b346b2	libbpf: Fix negative FD close() in xsk_setup_xdp_prog() Fix issue reported by static analysis (Coverity). If bpf_prog_get_fd_by_id() fails, xsk_lookup_bpf_maps() will fail as well and clean-up code will attempt close() with fd=-1. Fix by checking bpf_prog_get_fd_by_id() return result and exiting early. Fixes: `10a13bb40e` ("libbpf: remove qidconf and better support external bpf programs.") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191107054059.313884-1-andriin@fb.com	2019-11-07 16:15:27 +01:00
Ilya Leoshkevich	dab2e9eb18	s390/bpf: Remove unused SEEN_RET0, SEEN_REG_AX and ret0_ip We don't need them since commit `e1cf4befa2` ("bpf, s390x: remove ld_abs/ld_ind") and commit `a3212b8f15` ("bpf, s390x: remove obsolete exception handling from div/mod"). Also, use BIT(n) instead of 1 << n, because checkpatch says so. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191107114033.90505-1-iii@linux.ibm.com	2019-11-07 16:10:01 +01:00
Ilya Leoshkevich	6ad2e1a007	s390/bpf: Wrap JIT macro parameter usages in parentheses This change does not alter JIT behavior; it only makes it possible to safely invoke JIT macros with complex arguments in the future. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191107113211.90105-1-iii@linux.ibm.com	2019-11-07 16:07:28 +01:00
Ilya Leoshkevich	166f11d11f	s390/bpf: Use kvcalloc for addrs array A BPF program may consist of 1m instructions, which means JIT instruction-address mapping can be as large as 4m. s390 has FORCE_MAX_ZONEORDER=9 (for memory hotplug reasons), which means maximum kmalloc size is 1m. This makes it impossible to JIT programs with more than 256k instructions. Fix by using kvcalloc, which falls back to vmalloc for larger allocations. An alternative would be to use a radix tree, but that is not supported by bpf_prog_fill_jited_linfo. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191107141838.92202-1-iii@linux.ibm.com	2019-11-07 16:05:55 +01:00
Ilya Leoshkevich	7e22077d0c	tools, bpf_asm: Warn when jumps are out of range When compiling larger programs with bpf_asm, it's possible to accidentally exceed jt/jf range, in which case it won't complain, but rather silently emit a truncated offset, leading to a "happy debugging" situation. Add a warning to help detecting such issues. It could be made an error instead, but this might break compilation of existing code (which might be working by accident). Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191107100349.88976-1-iii@linux.ibm.com	2019-11-07 16:01:34 +01:00
Martin KaFai Lau	85d31dd070	bpf: Account for insn->off when doing bpf_probe_read_kernel In the bpf interpreter mode, bpf_probe_read_kernel is used to read from PTR_TO_BTF_ID's kernel object. It currently missed considering the insn->off. This patch fixes it. Fixes: `2a02759ef5` ("bpf: Add support for BTF pointers to interpreter") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191107014640.384083-1-kafai@fb.com	2019-11-06 21:52:52 -08:00
Andrii Nakryiko	ed57802121	libbpf: Simplify BPF_CORE_READ_BITFIELD_PROBED usage Streamline BPF_CORE_READ_BITFIELD_PROBED interface to follow BPF_CORE_READ_BITFIELD (direct) and BPF_CORE_READ, in general, i.e., just return read result or 0, if underlying bpf_probe_read() failed. In practice, real applications rarely check bpf_probe_read() result, because it has to always work or otherwise it's a bug. So propagating internal bpf_probe_read() error from this macro hurts usability without providing real benefits in practice. This patch fixes the issue and simplifies usage, noticeable even in selftest itself. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20191106201500.2582438-1-andriin@fb.com	2019-11-06 13:54:59 -08:00
Andrii Nakryiko	65a052d537	selftests/bps: Clean up removed ints relocations negative tests As part of `42765ede5c` ("selftests/bpf: Remove too strict field offset relo test cases"), few ints relocations negative (supposed to fail) tests were removed, but not completely. Due to them being negative, some leftovers in prog_tests/core_reloc.c went unnoticed. Clean them up. Fixes: `42765ede5c` ("selftests/bpf: Remove too strict field offset relo test cases") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191106173659.1978131-1-andriin@fb.com	2019-11-06 13:45:06 -08:00
Daniel Borkmann	f23c7ce341	Merge branch 'bpf-libbpf-bitfield-size-relo' Andrii Nakryiko says: ==================== This patch set adds support for reading bitfields in a relocatable manner through a set of relocations emitted by Clang, corresponding libbpf support for those relocations, as well as abstracting details into BPF_CORE_READ_BITFIELD/BPF_CORE_READ_BITFIELD_PROBED macro. We also add support for capturing relocatable field size, so that BPF program code can adjust its logic to actual amount of data it needs to operate on, even if it changes between kernels. New convenience macro is added to bpf_core_read.h (bpf_core_field_size(), in the same family of macro as bpf_core_read() and bpf_core_field_exists()). Corresponding set of selftests are added to excercise this logic and validate correctness in a variety of scenarios. Some of the overly strict logic of matching fields is relaxed to support wider variety of scenarios. See patch #1 for that. Patch #1 removes few overly strict test cases. Patch #2 adds support for bitfield-related relocations. Patch #3 adds some further adjustments to support generic field size relocations and introduces bpf_core_field_size() macro. Patch #4 tests bitfield reading. Patch #5 tests field size relocations. v1 -> v2: - added direct memory read-based macro and tests for bitfield reads. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-11-04 16:07:02 +01:00
Andrii Nakryiko	0b163565b9	selftests/bpf: Add field size relocation tests Add test verifying correctness and logic of field size relocation support in libbpf. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191101222810.1246166-6-andriin@fb.com	2019-11-04 16:06:56 +01:00
Andrii Nakryiko	8b1cb1c960	selftest/bpf: Add relocatable bitfield reading tests Add a bunch of selftests verifying correctness of relocatable bitfield reading support in libbpf. Both bpf_probe_read()-based and direct read-based bitfield macros are tested. core_reloc.c "test_harness" is extended to support raw tracepoint and new typed raw tracepoints as test BPF program types. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191101222810.1246166-5-andriin@fb.com	2019-11-04 16:06:56 +01:00
Andrii Nakryiko	94f060e984	libbpf: Add support for field size relocations Add bpf_core_field_size() macro, capturing a relocation against field size. Adjust bits of internal libbpf relocation logic to allow capturing size relocations of various field types: arrays, structs/unions, enums, etc. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191101222810.1246166-4-andriin@fb.com	2019-11-04 16:06:56 +01:00
Andrii Nakryiko	ee26dade0e	libbpf: Add support for relocatable bitfields Add support for the new field relocation kinds, necessary to support relocatable bitfield reads. Provide macro for abstracting necessary code doing full relocatable bitfield extraction into u64 value. Two separate macros are provided: - BPF_CORE_READ_BITFIELD macro for direct memory read-enabled BPF programs (e.g., typed raw tracepoints). It uses direct memory dereference to extract bitfield backing integer value. - BPF_CORE_READ_BITFIELD_PROBED macro for cases where bpf_probe_read() needs to be used to extract same backing integer value. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191101222810.1246166-3-andriin@fb.com	2019-11-04 16:06:56 +01:00
Andrii Nakryiko	42765ede5c	selftests/bpf: Remove too strict field offset relo test cases As libbpf is going to gain support for more field relocations, including field size, some restrictions about exact size match are going to be lifted. Remove test cases that explicitly test such failures. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191101222810.1246166-2-andriin@fb.com	2019-11-04 16:06:56 +01:00
David S. Miller	1574cf83c7	mlx5-updates-2019-11-01 Misc updates for mlx5 netdev and core driver 1) Steering Core: Replace CRC32 internal implementation with standard kernel lib. 2) Steering Core: Support IPv4 and IPv6 mixed matcher. 3) Steering Core: Lockless FTE read lookups 4) TC: Bit sized fields rewrite support. 5) FPGA: Standalone FPGA support. 6) SRIOV: Reset VF parameters configurations on SRIOV disable. 7) netdev: Dump WQs wqe descriptors on CQE with error events. 8) MISC Cleanups. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl28qcYACgkQSD+KveBX +j4+tgf9HM1EpNeeX/kfZGECMtxHgKuNC4NwpOExKU/OpvUxCNBZb1KXDjaeraE9 8fLvgA/T2cEHfNojJ+S9rRb64KxGw/96ieYimc9aYkIH4L5YEXYcUHj44RRMNYpI miWUQgcWABRvO5JvDMXqG+NIr7ctyodA7Qb3vlFWJvSB8KLNViPfPfPHn+oVDnxR ZBp1CWZXnlwO1ZMYQ2TDY1l0csDIx+awxkYXL3SFqJKBzheDItmQ4Ybw/yh+Mfv3 eS5DJo2rRADHsTDTKSbyaBzcTY1UEWhJW+stYlN0SP9YkH1y2Q3lDeNKQLfK4xkc YwYaCp8ZvLmOtLxwoujAIl2R5sjDpQ== =8kXy -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2019-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2019-11-01 Misc updates for mlx5 netdev and core driver 1) Steering Core: Replace CRC32 internal implementation with standard kernel lib. 2) Steering Core: Support IPv4 and IPv6 mixed matcher. 3) Steering Core: Lockless FTE read lookups 4) TC: Bit sized fields rewrite support. 5) FPGA: Standalone FPGA support. 6) SRIOV: Reset VF parameters configurations on SRIOV disable. 7) netdev: Dump WQs wqe descriptors on CQE with error events. 8) MISC Cleanups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 19:23:49 -08:00
YueHaibing	a37ac8ae66	mISDN: remove unused variable 'faxmodulation_s' drivers/isdn/hardware/mISDN/mISDNisar.c:30:17: warning: faxmodulation_s defined but not used [-Wunused-const-variable=] It is never used, so can be removed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 19:10:30 -08:00
Vincent Cheng	3a6ba7dc77	ptp: Add a ptp clock driver for IDT ClockMatrix. The IDT ClockMatrix (TM) family includes integrated devices that provide eight PLL channels. Each PLL channel can be independently configured as a frequency synthesizer, jitter attenuator, digitally controlled oscillator (DCO), or a digital phase lock loop (DPLL). Typically these devices are used as timing references and clock sources for PTP applications. This patch adds support for the device. Co-developed-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:35:40 -08:00
Vincent Cheng	5c5e7aac63	dt-bindings: ptp: Add device tree binding for IDT ClockMatrix based PTP clock Add device tree binding doc for the IDT ClockMatrix PTP clock. Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:35:40 -08:00
Francesco Ruggeri	fac6fce9bd	net: icmp6: provide input address for traceroute6 traceroute6 output can be confusing, in that it shows the address that a router would use to reach the sender, rather than the address the packet used to reach the router. Consider this case: ------------------------ N2 \| \| ------ ------ N3 ---- \| R1 \| \| R2 \|------\|H2\| ------ ------ ---- \| \| ------------------------ N1 \| ---- \|H1\| ---- where H1's default route is through R1, and R1's default route is through R2 over N2. traceroute6 from H1 to H2 shows R2's address on N1 rather than on N2. The script below can be used to reproduce this scenario. traceroute6 output without this patch: traceroute to 2000:103::4 (2000:103::4), 30 hops max, 80 byte packets 1 2000:101::1 (2000:101::1) 0.036 ms 0.008 ms 0.006 ms 2 2000:101::2 (2000:101::2) 0.011 ms 0.008 ms 0.007 ms 3 2000:103::4 (2000:103::4) 0.013 ms 0.010 ms 0.009 ms traceroute6 output with this patch: traceroute to 2000:103::4 (2000:103::4), 30 hops max, 80 byte packets 1 2000:101::1 (2000:101::1) 0.056 ms 0.019 ms 0.006 ms 2 2000:102::2 (2000:102::2) 0.013 ms 0.008 ms 0.008 ms 3 2000:103::4 (2000:103::4) 0.013 ms 0.009 ms 0.009 ms #!/bin/bash # # ------------------------ N2 # \| \| # ------ ------ N3 ---- # \| R1 \| \| R2 \|------\|H2\| # ------ ------ ---- # \| \| # ------------------------ N1 # \| # ---- # \|H1\| # ---- # # N1: 2000:101::/64 # N2: 2000:102::/64 # N3: 2000:103::/64 # # R1's host part of address: 1 # R2's host part of address: 2 # H1's host part of address: 3 # H2's host part of address: 4 # # For example: # the IPv6 address of R1's interface on N2 is 2000:102::1/64 # # Nets are implemented by macvlan interfaces (bridge mode) over # dummy interfaces. # # Create net namespaces ip netns add host1 ip netns add host2 ip netns add rtr1 ip netns add rtr2 # Create nets ip link add net1 type dummy; ip link set net1 up ip link add net2 type dummy; ip link set net2 up ip link add net3 type dummy; ip link set net3 up # Add interfaces to net1, move them to their nemaspaces ip link add link net1 dev host1net1 type macvlan mode bridge ip link set host1net1 netns host1 ip link add link net1 dev rtr1net1 type macvlan mode bridge ip link set rtr1net1 netns rtr1 ip link add link net1 dev rtr2net1 type macvlan mode bridge ip link set rtr2net1 netns rtr2 # Add interfaces to net2, move them to their nemaspaces ip link add link net2 dev rtr1net2 type macvlan mode bridge ip link set rtr1net2 netns rtr1 ip link add link net2 dev rtr2net2 type macvlan mode bridge ip link set rtr2net2 netns rtr2 # Add interfaces to net3, move them to their nemaspaces ip link add link net3 dev rtr2net3 type macvlan mode bridge ip link set rtr2net3 netns rtr2 ip link add link net3 dev host2net3 type macvlan mode bridge ip link set host2net3 netns host2 # Configure interfaces and routes in host1 ip netns exec host1 ip link set lo up ip netns exec host1 ip link set host1net1 up ip netns exec host1 ip -6 addr add 2000:101::3/64 dev host1net1 ip netns exec host1 ip -6 route add default via 2000:101::1 # Configure interfaces and routes in rtr1 ip netns exec rtr1 ip link set lo up ip netns exec rtr1 ip link set rtr1net1 up ip netns exec rtr1 ip -6 addr add 2000:101::1/64 dev rtr1net1 ip netns exec rtr1 ip link set rtr1net2 up ip netns exec rtr1 ip -6 addr add 2000:102::1/64 dev rtr1net2 ip netns exec rtr1 ip -6 route add default via 2000:102::2 ip netns exec rtr1 sysctl net.ipv6.conf.all.forwarding=1 # Configure interfaces and routes in rtr2 ip netns exec rtr2 ip link set lo up ip netns exec rtr2 ip link set rtr2net1 up ip netns exec rtr2 ip -6 addr add 2000:101::2/64 dev rtr2net1 ip netns exec rtr2 ip link set rtr2net2 up ip netns exec rtr2 ip -6 addr add 2000:102::2/64 dev rtr2net2 ip netns exec rtr2 ip link set rtr2net3 up ip netns exec rtr2 ip -6 addr add 2000:103::2/64 dev rtr2net3 ip netns exec rtr2 sysctl net.ipv6.conf.all.forwarding=1 # Configure interfaces and routes in host2 ip netns exec host2 ip link set lo up ip netns exec host2 ip link set host2net3 up ip netns exec host2 ip -6 addr add 2000:103::4/64 dev host2net3 ip netns exec host2 ip -6 route add default via 2000:103::2 # Ping host2 from host1 ip netns exec host1 ping6 -c5 2000:103::4 # Traceroute host2 from host1 ip netns exec host1 traceroute6 2000:103::4 # Delete nets ip link del net3 ip link del net2 ip link del net1 # Delete namespaces ip netns del rtr2 ip netns del rtr1 ip netns del host2 ip netns del host1 Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Original-patch-by: Honggang Xu <hxu@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:26:53 -08:00
Tuong Lien	06e7c70c6e	tipc: improve message bundling algorithm As mentioned in commit `e95584a889` ("tipc: fix unlimited bundling of small messages"), the current message bundling algorithm is inefficient that can generate bundles of only one payload message, that causes unnecessary overheads for both the sender and receiver. This commit re-designs the 'tipc_msg_make_bundle()' function (now named as 'tipc_msg_try_bundle()'), so that when a message comes at the first place, we will just check & keep a reference to it if the message is suitable for bundling. The message buffer will be put into the link backlog queue and processed as normal. Later on, when another one comes we will make a bundle with the first message if possible and so on... This way, a bundle if really needed will always consist of at least two payload messages. Otherwise, we let the first buffer go its way without any need of bundling, so reduce the overheads to zero. Moreover, since now we have both the messages in hand, we can even optimize the 'tipc_msg_bundle()' function, make bundle of a very large (size ~ MSS) and small messages which is not with the current algorithm e.g. [1400-byte message] + [10-byte message] (MTU = 1500). Acked-by: Ying Xue <ying.xue@windreiver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:26:15 -08:00
Francesco Ruggeri	2adf81c0f7	net: icmp: use input address in traceroute Even with icmp_errors_use_inbound_ifaddr set, traceroute returns the primary address of the interface the packet was received on, even if the path goes through a secondary address. In the example: 1.0.3.1/24 ---- 1.0.1.3/24 1.0.1.1/24 ---- 1.0.2.1/24 1.0.2.4/24 ---- \|H1\|--------------------------\|R1\|--------------------------\|H2\| ---- N1 ---- N2 ---- where 1.0.3.1/24 is R1's primary address on N1, traceroute from H1 to H2 returns: traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets 1 1.0.3.1 (1.0.3.1) 0.018 ms 0.006 ms 0.006 ms 2 1.0.2.4 (1.0.2.4) 0.021 ms 0.007 ms 0.007 ms After applying this patch, it returns: traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets 1 1.0.1.1 (1.0.1.1) 0.033 ms 0.007 ms 0.006 ms 2 1.0.2.4 (1.0.2.4) 0.011 ms 0.007 ms 0.007 ms Original-patch-by: Bill Fenner <fenner@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:25:18 -08:00
David S. Miller	c219a16622	Merge branch 'optimize-openvswitch-flow-looking-up' Tonghao Zhang says: ==================== optimize openvswitch flow looking up This series patch optimize openvswitch for performance or simplify codes. Patch 1, 2, 4: Port Pravin B Shelar patches to linux upstream with little changes. Patch 5, 6, 7: Optimize the flow looking up and simplify the flow hash. Patch 8, 9: are bugfix. The performance test is on Intel Xeon E5-2630 v4. The test topology is show as below: +-----------------------------------+ \| +---------------------------+ \| \| \| eth0 ovs-switch eth1 \| \| Host0 \| +---------------------------+ \| +-----------------------------------+ ^ \| \| \| \| \| \| \| \| v +-----+----+ +----+-----+ \| netperf \| Host1 \| netserver\| Host2 +----------+ +----------+ We use netperf send the 64B packets, and insert 255+ flow-mask: $ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:01:00:00:00:00/ff:ff:ff:ff:ff:01),eth_type(0x0800),ipv4(frag=no)" 2 ... $ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:ff:00:00:00:00/ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(frag=no)" 2 $ $ netperf -t UDP_STREAM -H 2.2.2.200 -l 40 -- -m 18 * Without series patch, throughput 8.28Mbps * With series patch, throughput 46.05Mbps v6: some coding style fixes v5: rewrite patch 8, release flow-mask when freeing flow v4: access ma->count with READ_ONCE/WRITE_ONCE API. More information, see patch 5 comments. v3: update ma point when realloc mask_array in patch 5 v2: simplify codes. e.g. use kfree_rcu instead of call_rcu ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:18:04 -08:00
Tonghao Zhang	eec62eadd1	net: openvswitch: simplify the ovs_dp_cmd_new use the specified functions to init resource. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:18:04 -08:00
Tonghao Zhang	4c76bf696a	net: openvswitch: don't unlock mutex when changing the user_features fails Unlocking of a not locked mutex is not allowed. Other kernel thread may be in critical section while we unlock it because of setting user_feature fail. Fixes: `95a7233c4` ("net: openvswitch: Set OvS recirc_id from tc chain index") Cc: Paul Blakey <paulb@mellanox.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:18:04 -08:00
Tonghao Zhang	50b0e61b32	net: openvswitch: fix possible memleak on destroy flow-table When we destroy the flow tables which may contain the flow_mask, so release the flow mask struct. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:18:03 -08:00
Tonghao Zhang	0a3e01371d	net: openvswitch: add likely in flow_lookup The most case *index < ma->max, and flow-mask is not NULL. We add un/likely for performance. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:18:03 -08:00
Tonghao Zhang	515b65a4b9	net: openvswitch: simplify the flow_hash Simplify the code and remove the unnecessary BUILD_BUG_ON. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:18:03 -08:00
Tonghao Zhang	57f7d7b916	net: openvswitch: optimize flow-mask looking up The full looking up on flow table traverses all mask array. If mask-array is too large, the number of invalid flow-mask increase, performance will be drop. One bad case, for example: M means flow-mask is valid and NULL of flow-mask means deleted. +-------------------------------------------+ \| M \| NULL \| ... \| NULL \| M\| +-------------------------------------------+ In that case, without this patch, openvswitch will traverses all mask array, because there will be one flow-mask in the tail. This patch changes the way of flow-mask inserting and deleting, and the mask array will be keep as below: there is not a NULL hole. In the fast path, we can "break" "for" (not "continue") in flow_lookup when we get a NULL flow-mask. "break" v +-------------------------------------------+ \| M \| M \| NULL \|... \| NULL \| NULL\| +-------------------------------------------+ This patch don't optimize slow or control path, still using ma->max to traverse. Slow path: * tbl_mask_array_realloc * ovs_flow_tbl_lookup_exact * flow_mask_find Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:18:03 -08:00
Tonghao Zhang	a7f35e78e7	net: openvswitch: optimize flow mask cache hash collision Port the codes to linux upstream and with little changes. Pravin B Shelar, says: \| In case hash collision on mask cache, OVS does extra flow \| lookup. Following patch avoid it. Link: `0e6efbe271` Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:18:03 -08:00
Tonghao Zhang	1689754de6	net: openvswitch: shrink the mask array if necessary When creating and inserting flow-mask, if there is no available flow-mask, we realloc the mask array. When removing flow-mask, if necessary, we shrink mask array. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:18:03 -08:00
Tonghao Zhang	4bc63b1b53	net: openvswitch: convert mask list in mask array Port the codes to linux upstream and with little changes. Pravin B Shelar, says: \| mask caches index of mask in mask_list. On packet recv OVS \| need to traverse mask-list to get cached mask. Therefore array \| is better for retrieving cached mask. This also allows better \| cache replacement algorithm by directly checking mask's existence. Link: `d49fc3ff53` Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:18:03 -08:00
Tonghao Zhang	04b7d136d0	net: openvswitch: add flow-mask cache for performance The idea of this optimization comes from a patch which is committed in 2014, openvswitch community. The author is Pravin B Shelar. In order to get high performance, I implement it again. Later patches will use it. Pravin B Shelar, says: \| On every packet OVS needs to lookup flow-table with every \| mask until it finds a match. The packet flow-key is first \| masked with mask in the list and then the masked key is \| looked up in flow-table. Therefore number of masks can \| affect packet processing performance. Link: `5604935e4e` Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-03 17:18:03 -08:00
David S. Miller	ae8a76fb8b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2019-11-02 The following pull-request contains BPF updates for your net-next tree. We've added 30 non-merge commits during the last 7 day(s) which contain a total of 41 files changed, 1864 insertions(+), 474 deletions(-). The main changes are: 1) Fix long standing user vs kernel access issue by introducing bpf_probe_read_user() and bpf_probe_read_kernel() helpers, from Daniel. 2) Accelerated xskmap lookup, from Björn and Maciej. 3) Support for automatic map pinning in libbpf, from Toke. 4) Cleanup of BTF-enabled raw tracepoints, from Alexei. 5) Various fixes to libbpf and selftests. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-02 15:29:58 -07:00
David S. Miller	d31e95585c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net The only slightly tricky merge conflict was the netdevsim because the mutex locking fix overlapped a lot of driver reload reorganization. The rest were (relatively) trivial in nature. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-02 13:54:56 -07:00
Alexei Starovoitov	358fdb4562	Merge branch 'bpf_probe_read_user' Daniel Borkmann says: ==================== This set adds probe_read_{user,kernel}(), probe_read_str_{user,kernel}() helpers, fixes probe_write_user() helper and selftests. For details please see individual patches. Thanks! v2 -> v3: - noticed two more things that are fixed in here: - bpf uapi helper description used 'int size' for *_str helpers, now u32 - we need TASK_SIZE_MAX + guard page on x86-64 in patch 2 otherwise we'll trigger the `00c42373d3` warn as well, so full range covered now v1 -> v2: - standardize unsafe_ptr terminology in uapi header comment (Andrii) - probe_read_{user,kernel}[_str] naming scheme (Andrii) - use global data in last test case, remove relaxed_maps (Andrii) - add strict non-pagefault kernel read funcs to avoid warning in kernel probe read helpers (Alexei) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-11-02 12:45:15 -07:00
Daniel Borkmann	fa553d9b57	bpf, testing: Add selftest to read/write sockaddr from user space Tested on x86-64 and Ilya was also kind enough to give it a spin on s390x, both passing with probe_user:OK there. The test is using the newly added bpf_probe_read_user() to dump sockaddr from connect call into .bss BPF map and overrides the user buffer via bpf_probe_write_user(): # ./test_progs [...] #17 pkt_md_access:OK #18 probe_user:OK #19 prog_run_xattr:OK [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/90f449d8af25354e05080e82fc6e2d3179da30ea.1572649915.git.daniel@iogearbox.net	2019-11-02 12:45:08 -07:00
Daniel Borkmann	50f9aa44ca	bpf, testing: Convert prog tests to probe_read_{user, kernel}{, _str} helper Use probe read *_{kernel,user}{,_str}() helpers instead of bpf_probe_read() or bpf_probe_read_user_str() for program tests where appropriate. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/4a61d4b71ce3765587d8ef5cb93afa18515e5b3e.1572649915.git.daniel@iogearbox.net	2019-11-02 12:39:13 -07:00

1 2 3 4 5 ...

873508 Commits All Branches Search

873508 Commits

All Branches