OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Nathan Chancellor	098454362a	test_bpf: Fix a new clang warning about xor-ing two numbers r369217 in clang added a new warning about potential misuse of the xor operator as an exponentiation operator: ../lib/test_bpf.c:870:13: warning: result of '10 ^ 300' is 294; did you mean '1e300'? [-Wxor-used-as-pow] { { 4, 10 ^ 300 }, { 20, 10 ^ 300 } }, ~~~^~~~~ 1e300 ../lib/test_bpf.c:870:13: note: replace expression with '0xA ^ 300' to silence this warning ../lib/test_bpf.c:870:31: warning: result of '10 ^ 300' is 294; did you mean '1e300'? [-Wxor-used-as-pow] { { 4, 10 ^ 300 }, { 20, 10 ^ 300 } }, ~~~^~~~~ 1e300 ../lib/test_bpf.c:870:31: note: replace expression with '0xA ^ 300' to silence this warning The commit link for this new warning has some good logic behind wanting to add it but this instance appears to be a false positive. Adopt its suggestion to silence the warning but not change the code. According to the differential review link in the clang commit, GCC may eventually adopt this warning as well. Link: https://github.com/ClangBuiltLinux/linux/issues/643 Link: `920890e268` Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-20 17:07:29 +02:00
Masahiro Yamada	69ecfdaa53	bpf: add include guard to tnum.h Add a header include guard just in case. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-20 17:05:05 +02:00
Quentin Monnet	3481e64bbe	bpf: add BTF ids in procfs for file descriptors to BTF objects Implement the show_fdinfo hook for BTF FDs file operations, and make it print the id of the BTF object. This allows for a quick retrieval of the BTF id from its FD; or it can help understanding what type of object (BTF) the file descriptor points to. v2: - Do not expose data_size, only btf_id, in FD info. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-20 16:19:12 +02:00
YueHaibing	ede6bc88d6	bpf: Use PTR_ERR_OR_ZERO in xsk_map_inc() Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-20 16:03:52 +02:00
Daniel Borkmann	1f72672327	Merge branch 'bpf-af-xdp-xskmap-improvements' Björn Töpel says: ==================== This series (v5 and counting) add two improvements for the XSKMAP, used by AF_XDP sockets. 1. Automatic cleanup when an AF_XDP socket goes out of scope/is released. Instead of require that the user manually clears the "released" state socket from the map, this is done automatically. Each socket tracks which maps it resides in, and remove itself from those maps at relase. A notable implementation change, is that the sockets references the map, instead of the map referencing the sockets. Which implies that when the XSKMAP is freed, it is by definition cleared of sockets. 2. The XSKMAP did not honor the BPF_EXIST/BPF_NOEXIST flag on insert, which this patch addresses. v1->v2: Fixed deadlock and broken cleanup. (Daniel) v2->v3: Rebased onto bpf-next v3->v4: {READ, WRITE}_ONCE consistency. (Daniel) Socket release/map update race. (Daniel) v4->v5: Avoid use-after-free on XSKMAP self-assignment [1]. (Daniel) Removed redundant assignment in xsk_map_update_elem(). Variable name consistency; Use map_entry everywhere. [1] https://lore.kernel.org/bpf/20190802081154.30962-1-bjorn.topel@gmail.com/T/#mc68439e97bc07fa301dad9fc4850ed5aa392f385 ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-17 23:24:46 +02:00
Björn Töpel	36cc34358c	xsk: support BPF_EXIST and BPF_NOEXIST flags in XSKMAP The XSKMAP did not honor the BPF_EXIST/BPF_NOEXIST flags when updating an entry. This patch addresses that. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-17 23:24:45 +02:00
Björn Töpel	0402acd683	xsk: remove AF_XDP socket from map when the socket is released When an AF_XDP socket is released/closed the XSKMAP still holds a reference to the socket in a "released" state. The socket will still use the netdev queue resource, and block newly created sockets from attaching to that queue, but no user application can access the fill/complete/rx/tx queues. This results in that all applications need to explicitly clear the map entry from the old "zombie state" socket. This should be done automatically. In this patch, the sockets tracks, and have a reference to, which maps it resides in. When the socket is released, it will remove itself from all maps. Suggested-by: Bruce Richardson <bruce.richardson@intel.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-17 23:24:45 +02:00
Daniel Borkmann	8e46c3534a	Merge branch 'bpf-sk-storage-clone' Stanislav Fomichev says: ==================== Currently there is no way to propagate sk storage from the listener socket to a newly accepted one. Consider the following use case: fd = socket(); setsockopt(fd, SOL_IP, IP_TOS,...); /* ^^^ setsockopt BPF program triggers here and saves something * into sk storage of the listener. / listen(fd, ...); while (client = accept(fd)) { / At this point all association between listener * socket and newly accepted one is gone. New * socket will not have any sk storage attached. / } Let's add new BPF_F_CLONE flag that can be specified when creating a socket storage map. This new flag indicates that map contents should be cloned when the socket is cloned. v4: drop 'goto err' in bpf_sk_storage_clone (Yonghong Song) * add comment about race with bpf_sk_storage_map_free to the bpf_sk_storage_clone side as well (Daniel Borkmann) v3: * make sure BPF_F_NO_PREALLOC is always present when creating a map (Martin KaFai Lau) * don't call bpf_sk_storage_free explicitly, rely on sk_free_unlock_clone to do the cleanup (Martin KaFai Lau) v2: * remove spinlocks around selem_link_map/sk (Martin KaFai Lau) * BPF_F_CLONE on a map, not selem (Martin KaFai Lau) * hold a map while cloning (Martin KaFai Lau) * use BTF maps in selftests (Yonghong Song) * do proper cleanup selftests; don't call close(-1) (Yonghong Song) * export bpf_map_inc_not_zero ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-17 23:18:54 +02:00
Stanislav Fomichev	c3bbf176fb	selftests/bpf: add sockopt clone/inheritance test Add a test that calls setsockopt on the listener socket which triggers BPF program. This BPF program writes to the sk storage and sets clone flag. Make sure that sk storage is cloned for a newly accepted connection. We have two cloned maps in the tests to make sure we hit both cases in bpf_sk_storage_clone: first element (sk_storage_alloc) and non-first element(s) (selem_link_map). Cc: Martin KaFai Lau <kafai@fb.com> Cc: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-17 23:18:54 +02:00
Stanislav Fomichev	9e819ffcfe	bpf: sync bpf.h to tools/ Sync new sk storage clone flag. Cc: Martin KaFai Lau <kafai@fb.com> Cc: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-17 23:18:54 +02:00
Stanislav Fomichev	8f51dfc73b	bpf: support cloning sk storage on accept() Add new helper bpf_sk_storage_clone which optionally clones sk storage and call it from sk_clone_lock. Cc: Martin KaFai Lau <kafai@fb.com> Cc: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-17 23:18:54 +02:00
Stanislav Fomichev	b0e4701ce1	bpf: export bpf_map_inc_not_zero Rename existing bpf_map_inc_not_zero to __bpf_map_inc_not_zero to indicate that it's caller's responsibility to do proper locking. Create and export bpf_map_inc_not_zero wrapper that properly locks map_idr_lock. Will be used in the next commit to hold a map while cloning a socket. Cc: Martin KaFai Lau <kafai@fb.com> Cc: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-17 23:18:54 +02:00
Petar Penkov	fae55527ac	selftests/bpf: fix race in test_tcp_rtt test There is a race in this test between receiving the ACK for the single-byte packet sent in the test, and reading the values from the map. This patch fixes this by having the client wait until there are no more unacknowledged packets. Before: for i in {1..1000}; do ../net/in_netns.sh ./test_tcp_rtt; \ done \| grep -c PASSED < trimmed error messages > 993 After: for i in {1..10000}; do ../net/in_netns.sh ./test_tcp_rtt; \ done \| grep -c PASSED 10000 Fixes: `b55873984d` ("selftests/bpf: test BPF_SOCK_OPS_RTT_CB") Signed-off-by: Petar Penkov <ppenkov@google.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-17 23:16:25 +02:00
Andrii Nakryiko	929ffa6e9d	libbpf: relicense bpf_helpers.h and bpf_endian.h bpf_helpers.h and bpf_endian.h contain useful macros and BPF helper definitions essential to almost every BPF program. Which makes them useful not just for selftests. To be able to expose them as part of libbpf, though, we need them to be dual-licensed as LGPL-2.1 OR BSD-2-Clause. This patch updates licensing of those two files. Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Hechao Li <hechaol@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Andrey Ignatov <rdna@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Adam Barth <arb@fb.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Josef Bacik <jbacik@fb.com> Acked-by: Joe Stringer <joe@wand.net.nz> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by: David Ahern <dsahern@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Lorenz Bauer <lmb@cloudflare.com> Acked-by: Adrian Ratiu <adrian.ratiu@collabora.com> Acked-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Petar Penkov <ppenkov@google.com> Acked-by: Teng Qin <palmtenor@gmail.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Michal Rostecki <mrostecki@opensuse.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Sargun Dhillon <sargun@sargun.me> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-17 23:14:21 +02:00
Maxim Mikityanskiy	c14a9f633d	net: Don't call XDP_SETUP_PROG when nothing is changed Don't uninstall an XDP program when none is installed, and don't install an XDP program that has the same ID as the one already installed. dev_change_xdp_fd doesn't perform any checks in case it uninstalls an XDP program. It means that the driver's ndo_bpf can be called with XDP_SETUP_PROG asking to set it to NULL even if it's already NULL. This case happens if the user runs `ip link set eth0 xdp off` when there is no XDP program attached. The symmetrical case is possible when the user tries to set the program that is already set. The drivers typically perform some heavy operations on XDP_SETUP_PROG, so they all have to handle these cases internally to return early if they happen. This patch puts this check into the kernel code, so that all drivers will benefit from it. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-17 23:11:54 +02:00
Daniel Borkmann	c8186c8064	Merge branch 'bpf-af-xdp-wakeup' Magnus Karlsson says: ==================== This patch set adds support for a new flag called need_wakeup in the AF_XDP Tx and fill rings. When this flag is set by the driver, it means that the application has to explicitly wake up the kernel Rx (for the bit in the fill ring) or kernel Tx (for bit in the Tx ring) processing by issuing a syscall. Poll() can wake up both and sendto() will wake up Tx processing only. The main reason for introducing this new flag is to be able to efficiently support the case when application and driver is executing on the same core. Previously, the driver was just busy-spinning on the fill ring if it ran out of buffers in the HW and there were none to get from the fill ring. This approach works when the application and driver is running on different cores as the application can replenish the fill ring while the driver is busy-spinning. Though, this is a lousy approach if both of them are running on the same core as the probability of the fill ring getting more entries when the driver is busy-spinning is zero. With this new feature the driver now sets the need_wakeup flag and returns to the application. The application can then replenish the fill queue and then explicitly wake up the Rx processing in the kernel using the syscall poll(). For Tx, the flag is only set to one if the driver has no outstanding Tx completion interrupts. If it has some, the flag is zero as it will be woken up by a completion interrupt anyway. This flag can also be used in other situations where the driver needs to be woken up explicitly. As a nice side effect, this new flag also improves the Tx performance of the case where application and driver are running on two different cores as it reduces the number of syscalls to the kernel. The kernel tells user space if it needs to be woken up by a syscall, and this eliminates many of the syscalls. The Rx performance of the 2-core case is on the other hand slightly worse, since there is a need to use a syscall now to wake up the driver, instead of the driver busy-spinning. It does waste less CPU cycles though, which might lead to better overall system performance. This new flag needs some simple driver support. If the driver does not support it, the Rx flag is always zero and the Tx flag is always one. This makes any application relying on this feature default to the old behavior of not requiring any syscalls in the Rx path and always having to call sendto() in the Tx path. For backwards compatibility reasons, this feature has to be explicitly turned on using a new bind flag (XDP_USE_NEED_WAKEUP). I recommend that you always turn it on as it has a large positive performance impact for the one core case and does not degrade 2 core performance and actually improves it for Tx heavy workloads. Here are some performance numbers measured on my local, non-performance optimized development system. That is why you are seeing numbers lower than the ones from Björn and Jesper. 64 byte packets at 40Gbit/s line rate. All results in Mpps. Cores == 1 means that both application and driver is executing on the same core. Cores == 2 that they are on different cores. Applications need_wakeup cores txpush rxdrop l2fwd --------------------------------------------------------------- n 1 0.07 0.06 0.03 y 1 21.6 8.2 6.5 n 2 32.3 11.7 8.7 y 2 33.1 11.7 8.7 Overall, the need_wakeup flag provides the same or better performance in all the micro-benchmarks. The reduction of sendto() calls in txpush is large. Only a few per second is needed. For l2fwd, the drop is 50% for the 1 core case and more than 99.9% for the 2 core case. Do not know why I am not seeing the same drop for the 1 core case yet. The name and inspiration of the flag has been taken from io_uring by Jens Axboe. Details about this feature in io_uring can be found in http://kernel.dk/io_uring.pdf, section 8.3. It also addresses most of the denial of service and sendto() concerns raised by Maxim Mikityanskiy in https://www.spinics.net/lists/netdev/msg554657.html. The typical Tx part of an application will have to change from: ret = sendto(fd,....) to: if (xsk_ring_prod__needs_wakeup(&xsk->tx)) ret = sendto(fd,....) and th Rx part from: rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE, &idx_rx); if (!rcvd) return; to: rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE, &idx_rx); if (!rcvd) { if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) ret = poll(fd,.....); return; } v3 -> v4: * Maxim found a possible race in the Tx part of the driver. The setting of the flag needs to happen before the sending, otherwise it might trigger this race. Fixed in ixgbe and i40e driver. * Mellanox support contributed by Maxim * Removed the XSK_DRV_CAN_SLEEP flag as it was not used anymore. Thanks to Sridhar for discovering this. * For consistency the feature is now always called need_wakeup. There were some places where it was referred to as might_sleep, but they have been removed. Thanks to Sridhar for spotting. * Fixed some typos in the commit messages v2 -> v3: * Converted the Mellanox driver to the new ndo in patch 1 as pointed out by Maxim * Fixed the compatibility code of XDP_MMAP_OFFSETS so it now works. v1 -> v2: * Fixed bisectability problem pointed out by Jakub * Added missing initiliztion of the Tx need_wakeup flag to 1 This patch has been applied against commit `b753c5a7f9` ("Merge branch 'r8152-RX-improve'") Structure of the patch set: Patch 1: Replaces the ndo_xsk_async_xmit with ndo_xsk_wakeup to support waking up both Rx and Tx processing Patch 2: Implements the need_wakeup functionality in common code Patch 3-4: Add need_wakeup support to the i40e and ixgbe drivers Patch 5: Add need_wakeup support to libbpf Patch 6: Add need_wakeup support to the xdpsock sample application Patch 7-8: Add need_wakeup support to the Mellanox mlx5 driver ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-17 23:07:32 +02:00
Maxim Mikityanskiy	a7bd4018d6	net/mlx5e: Add AF_XDP need_wakeup support This commit adds support for the new need_wakeup feature of AF_XDP. The applications can opt-in by using the XDP_USE_NEED_WAKEUP bind() flag. When this feature is enabled, some behavior changes: RX side: If the Fill Ring is empty, instead of busy-polling, set the flag to tell the application to kick the driver when it refills the Fill Ring. TX side: If there are pending completions or packets queued for transmission, set the flag to tell the application that it can skip the sendto() syscall and save time. The performance testing was performed on a machine with the following configuration: - 24 cores of Intel Xeon E5-2620 v3 @ 2.40 GHz - Mellanox ConnectX-5 Ex with 100 Gbit/s link The results with retpoline disabled: \| without need_wakeup \| with need_wakeup \| \|----------------------\|----------------------\| \| one core \| two cores \| one core \| two cores \| -------\|----------\|-----------\|----------\|-----------\| txonly \| 20.1 \| 33.5 \| 29.0 \| 34.2 \| rxdrop \| 0.065 \| 14.1 \| 12.0 \| 14.1 \| l2fwd \| 0.032 \| 7.3 \| 6.6 \| 7.2 \| "One core" means the application and NAPI run on the same core. "Two cores" means they are pinned to different cores. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-17 23:07:32 +02:00
Maxim Mikityanskiy	871aa189a6	net/mlx5e: Move the SW XSK code from NAPI poll to a separate function Two XSK tasks are performed during NAPI polling, that are not bound to hardware interrupts: TXing packets and polling for frames in the Fill Ring. They are special in a way that the hardware doesn't know about these tasks, so it doesn't trigger interrupts if there is still some work to be done, it's our driver's responsibility to ensure NAPI will be rescheduled if needed. Create a new function to handle these tasks and move the corresponding code from mlx5e_napi_poll to the new function to improve modularity and prepare for the changes in the following patch. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-17 23:07:32 +02:00
Magnus Karlsson	46738f73ea	samples/bpf: add use of need_wakeup flag in xdpsock This commit adds using the need_wakeup flag to the xdpsock sample application. It is turned on by default as we think it is a feature that seems to always produce a performance benefit, if the application has been written taking advantage of it. It can be turned off in the sample app by using the '-m' command line option. The txpush and l2fwd sub applications have also been updated to support poll() with multiple sockets. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-17 23:07:32 +02:00
Magnus Karlsson	a4500432c2	libbpf: add support for need_wakeup flag in AF_XDP part This commit adds support for the new need_wakeup flag in AF_XDP. The xsk_socket__create function is updated to handle this and a new function is introduced called xsk_ring_prod__needs_wakeup(). This function can be used by the application to check if Rx and/or Tx processing needs to be explicitly woken up. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-17 23:07:32 +02:00
Magnus Karlsson	5c129241e2	ixgbe: add support for AF_XDP need_wakeup feature This patch adds support for the need_wakeup feature of AF_XDP. If the application has told the kernel that it might sleep using the new bind flag XDP_USE_NEED_WAKEUP, the driver will then set this flag if it has no more buffers on the NIC Rx ring and yield to the application. For Tx, it will set the flag if it has no outstanding Tx completion interrupts and return to the application. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-17 23:07:32 +02:00
Magnus Karlsson	3d0c5f1cd2	i40e: add support for AF_XDP need_wakeup feature This patch adds support for the need_wakeup feature of AF_XDP. If the application has told the kernel that it might sleep using the new bind flag XDP_USE_NEED_WAKEUP, the driver will then set this flag if it has no more buffers on the NIC Rx ring and yield to the application. For Tx, it will set the flag if it has no outstanding Tx completion interrupts and return to the application. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-17 23:07:32 +02:00
Magnus Karlsson	77cd0d7b3f	xsk: add support for need_wakeup flag in AF_XDP rings This commit adds support for a new flag called need_wakeup in the AF_XDP Tx and fill rings. When this flag is set, it means that the application has to explicitly wake up the kernel Rx (for the bit in the fill ring) or kernel Tx (for bit in the Tx ring) processing by issuing a syscall. Poll() can wake up both depending on the flags submitted and sendto() will wake up tx processing only. The main reason for introducing this new flag is to be able to efficiently support the case when application and driver is executing on the same core. Previously, the driver was just busy-spinning on the fill ring if it ran out of buffers in the HW and there were none on the fill ring. This approach works when the application is running on another core as it can replenish the fill ring while the driver is busy-spinning. Though, this is a lousy approach if both of them are running on the same core as the probability of the fill ring getting more entries when the driver is busy-spinning is zero. With this new feature the driver now sets the need_wakeup flag and returns to the application. The application can then replenish the fill queue and then explicitly wake up the Rx processing in the kernel using the syscall poll(). For Tx, the flag is only set to one if the driver has no outstanding Tx completion interrupts. If it has some, the flag is zero as it will be woken up by a completion interrupt anyway. As a nice side effect, this new flag also improves the performance of the case where application and driver are running on two different cores as it reduces the number of syscalls to the kernel. The kernel tells user space if it needs to be woken up by a syscall, and this eliminates many of the syscalls. This flag needs some simple driver support. If the driver does not support this, the Rx flag is always zero and the Tx flag is always one. This makes any application relying on this feature default to the old behaviour of not requiring any syscalls in the Rx path and always having to call sendto() in the Tx path. For backwards compatibility reasons, this feature has to be explicitly turned on using a new bind flag (XDP_USE_NEED_WAKEUP). I recommend that you always turn it on as it so far always have had a positive performance impact. The name and inspiration of the flag has been taken from io_uring by Jens Axboe. Details about this feature in io_uring can be found in http://kernel.dk/io_uring.pdf, section 8.3. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-17 23:07:32 +02:00
Magnus Karlsson	9116e5e2b1	xsk: replace ndo_xsk_async_xmit with ndo_xsk_wakeup This commit replaces ndo_xsk_async_xmit with ndo_xsk_wakeup. This new ndo provides the same functionality as before but with the addition of a new flags field that is used to specifiy if Rx, Tx or both should be woken up. The previous ndo only woke up Tx, as implied by the name. The i40e and ixgbe drivers (which are all the supported ones) are updated with this new interface. This new ndo will be used by the new need_wakeup functionality of XDP sockets that need to be able to wake up both Rx and Tx driver processing. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-17 23:07:31 +02:00
Wei Yongjun	e03250061b	btf: fix return value check in btf_vmlinux_init() In case of error, the function kobject_create_and_add() returns NULL pointer not ERR_PTR(). The IS_ERR() test in the return value check should be replaced with NULL test. Fixes: `341dfcf8d7` ("btf: expose BTF info through sysfs") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-08-15 22:18:17 -07:00
Alexei Starovoitov	82c4c3b7c7	Merge branch 'fix-printf' Quentin Monnet says: ==================== Because the "__printf()" attributes were used only where the functions are implemented, and not in header files, the checks have not been enforced on all the calls to printf()-like functions, and a number of errors slipped in bpftool over time. This set cleans up such errors, and then moves the "__printf()" attributes to header files, so that the checks are performed at all locations. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-08-15 22:06:47 -07:00
Quentin Monnet	8918dc42dc	tools: bpftool: move "__printf()" attributes to header file Some functions in bpftool have a "__printf()" format attributes to tell the compiler they should expect printf()-like arguments. But because these attributes are not used for the function prototypes in the header files, the compiler does not run the checks everywhere the functions are used, and some mistakes on format string and corresponding arguments slipped in over time. Let's move the __printf() attributes to the correct places. Note: We add guards around the definition of GCC_VERSION in tools/include/linux/compiler-gcc.h to prevent a conflict in jit_disasm.c on GCC_VERSION from headers pulled via libbfd. Fixes: `c101189bc9` ("tools: bpftool: fix -Wmissing declaration warnings") Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-08-15 22:06:46 -07:00
Quentin Monnet	b0ead6d75a	tools: bpftool: fix format string for p_err() in detect_common_prefix() There is one call to the p_err() function in detect_common_prefix() where the message to print is passed directly as the first argument, without using a format string. This is harmless, but may trigger warnings if the "__printf()" attribute is used correctly for the p_err() function. Let's fix it by using a "%s" format string. Fixes: `ba95c74524` ("tools: bpftool: add "prog run" subcommand to test-run programs") Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-08-15 22:06:46 -07:00
Quentin Monnet	8a15d5ced8	tools: bpftool: fix format string for p_err() in query_flow_dissector() The format string passed to one call to the p_err() function in query_flow_dissector() does not match the value that should be printed, resulting in some garbage integer being printed instead of strerror(errno) if /proc/self/ns/net cannot be open. Let's fix the format string. Fixes: `7f0c57fec8` ("bpftool: show flow_dissector attachment status") Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-08-15 22:06:46 -07:00
Quentin Monnet	ed4a3983cd	tools: bpftool: fix argument for p_err() in BTF do_dump() The last argument passed to one call to the p_err() function is not correct, it should be "argv" instead of "*argv". This may lead to a segmentation fault error if BTF id cannot be parsed correctly. Let's fix this. Fixes: c93cc69004dt ("bpftool: add ability to dump BTF types") Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-08-15 22:06:46 -07:00
Quentin Monnet	22c349e8db	tools: bpftool: fix format strings and arguments for jsonw_printf() There are some mismatches between format strings and arguments passed to jsonw_printf() in the BTF dumper for bpftool, which seems harmless but may result in warnings if the "__printf()" attribute is used correctly for jsonw_printf(). Let's fix relevant format strings and type cast. Fixes: `b12d6ec097` ("bpf: btf: add btf print functionality") Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-08-15 22:06:46 -07:00
Quentin Monnet	9def249dc8	tools: bpftool: fix arguments for p_err() in do_event_pipe() The last argument passed to some calls to the p_err() functions is not correct, it should be "argv" instead of "*argv". This may lead to a segmentation fault error if CPU IDs or indices from the command line cannot be parsed correctly. Let's fix this. Fixes: `f412eed9df` ("tools: bpftool: add simple perf event output reader") Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-08-15 22:06:46 -07:00
Andrii Nakryiko	dadb81d0af	libbpf: make libbpf.map source of truth for libbpf version Currently libbpf version is specified in 2 places: libbpf.map and Makefile. They easily get out of sync and it's very easy to update one, but forget to update another one. In addition, Github projection of libbpf has to maintain its own version which has to be remembered to be kept in sync manually, which is very error-prone approach. This patch makes libbpf.map a source of truth for libbpf version and uses shell invocation to parse out correct full and major libbpf version to use during build. Now we need to make sure that once new release cycle starts, we need to add (initially) empty section to libbpf.map with correct latest version. This also will make it possible to keep Github projection consistent with kernel sources version of libbpf by adopting similar parsing of version from libbpf.map. v2->v3: - grep -o + sort -rV (Andrey); v1->v2: - eager version vars evaluation (Jakub); - simplified version regex (Andrey); Cc: Andrey Ignatov <rdna@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-08-15 17:03:26 -07:00
Alexei Starovoitov	37b7c058d4	Merge branch 'bpftool-net-attach' Daniel T. Lee says: ==================== Currently, bpftool net only supports dumping progs attached on the interface. To attach XDP prog on interface, user must use other tool (eg. iproute2). By this patch, with `bpftool net attach/detach`, user can attach/detach XDP prog on interface. # bpftool prog 16: xdp name xdp_prog1 tag 539ec6ce11b52f98 gpl loaded_at 2019-08-07T08:30:17+0900 uid 0 ... 20: xdp name xdp_fwd_prog tag b9cb69f121e4a274 gpl loaded_at 2019-08-07T08:30:17+0900 uid 0 # bpftool net attach xdpdrv id 16 dev enp6s0np0 # bpftool net xdp: enp6s0np0(4) driver id 16 # bpftool net attach xdpdrv id 20 dev enp6s0np0 overwrite # bpftool net xdp: enp6s0np0(4) driver id 20 # bpftool net detach xdpdrv dev enp6s0np0 # bpftool net xdp: While this patch only contains support for XDP, through `net attach/detach`, bpftool can further support other prog attach types. XDP attach/detach tested on Mellanox ConnectX-4 and Netronome Agilio. --- Changes in v5: - fix wrong error message, from errno to err with do_attach/detach Changes in v4: - rename variable, attach/detach error message enhancement - bash-completion cleanup, doc update with brief description (attach types) Changes in v3: - added 'overwrite' option for replacing previously attached XDP prog - command argument order has been changed ('ATTACH_TYPE' comes first) - add 'dev' keyword in front of <devname> - added bash-completion and documentation Changes in v2: - command 'load/unload' changed to 'attach/detach' for the consistency ==================== Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-08-15 17:00:34 -07:00
Daniel T. Lee	cb9d996866	tools: bpftool: add documentation for net attach/detach Since, new sub-command 'net attach/detach' has been added for attaching XDP program on interface, this commit documents usage and sample output of `net attach/detach`. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-08-15 17:00:33 -07:00
Daniel T. Lee	10a708c24a	tools: bpftool: add bash-completion for net attach/detach This commit adds bash-completion for new "net attach/detach" subcommand for attaching XDP program on interface. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-08-15 17:00:33 -07:00
Daniel T. Lee	37c7f863ba	tools: bpftool: add net detach command to detach XDP on interface By this commit, using `bpftool net detach`, the attached XDP prog can be detached. Detaching the BPF prog will be done through libbpf 'bpf_set_link_xdp_fd' with the progfd set to -1. Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-08-15 17:00:33 -07:00
Daniel T. Lee	04949ccc27	tools: bpftool: add net attach command to attach XDP on interface By this commit, using `bpftool net attach`, user can attach XDP prog on interface. New type of enum 'net_attach_type' has been made, as stat ted at cover-letter, the meaning of 'attach' is, prog will be attached on interface. With 'overwrite' option at argument, attached XDP program could be replaced. Added new helper 'net_parse_dev' to parse the network device at argument. BPF prog will be attached through libbpf 'bpf_set_link_xdp_fd'. Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-08-15 17:00:33 -07:00
Quentin Monnet	a9436dca11	tools: bpftool: compile with $(EXTRA_WARNINGS) Compile bpftool with $(EXTRA_WARNINGS), as defined in scripts/Makefile.include, and fix the new warnings produced. Simply leave -Wswitch-enum out of the warning list, as we have several switch-case structures where it is not desirable to process all values of an enum. Remove -Wshadow from the warnings we manually add to CFLAGS, as it is handled in $(EXTRA_WARNINGS). Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-14 22:57:36 +02:00
Jakub Kicinski	c162610c7d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next: 1) Rename mss field to mss_option field in synproxy, from Fernando Mancera. 2) Use SYSCTL_{ZERO,ONE} definitions in conntrack, from Matteo Croce. 3) More strict validation of IPVS sysctl values, from Junwei Hu. 4) Remove unnecessary spaces after on the right hand side of assignments, from yangxingwu. 5) Add offload support for bitwise operation. 6) Extend the nft_offload_reg structure to store immediate date. 7) Collapse several ip_set header files into ip_set.h, from Jeremy Sowden. 8) Make netfilter headers compile with CONFIG_KERNEL_HEADER_TEST=y, from Jeremy Sowden. 9) Fix several sparse warnings due to missing prototypes, from Valdis Kletnieks. 10) Use static lock initialiser to ensure connlabel spinlock is initialized on boot time to fix sched/act_ct.c, patch from Florian Westphal. ==================== Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 18:22:57 -07:00
Jakub Kicinski	b753c5a7f9	Merge branch 'r8152-RX-improve' Hayes says: ==================== v2: For patch #2, replace list_for_each_safe with list_for_each_entry_safe. Remove unlikely in WARN_ON. Adjust the coding style. For patch #4, replace list_for_each_safe with list_for_each_entry_safe. Remove "else" after "continue". For patch #5. replace sysfs with ethtool to modify rx_copybreak and rx_pending. v1: The different chips use different rx buffer size. Use skb_add_rx_frag() to reduce memory copy for RX. ==================== Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 18:12:45 -07:00
Hayes Wang	e4a5017ac5	r8152: change rx_copybreak and rx_pending through ethtool Let the rx_copybreak and rx_pending could be modified by ethtool. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 18:12:09 -07:00
Hayes Wang	47922fcde5	r8152: support skb_add_rx_frag Use skb_add_rx_frag() to reduce the memory copy for rx data. Use a new list of rx_used to store the rx buffer which couldn't be reused yet. Besides, the total number of rx buffer may be increased or decreased dynamically. And it is limited by RTL8152_MAX_RX_AGG. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 18:12:08 -07:00
Hayes Wang	d55d70894c	r8152: use alloc_pages for rx buffer Replace kmalloc_node() with alloc_pages() for rx buffer. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 18:12:08 -07:00
Hayes Wang	252df8b866	r8152: replace array with linking list for rx information The original method uses an array to store the rx information. The new one uses a list to link each rx structure. Then, it is possible to increase/decrease the number of rx structure dynamically. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 18:12:08 -07:00
Hayes Wang	ec5791c202	r8152: separate the rx buffer size The different chips may accept different rx buffer sizes. The RTL8152 supports 16K bytes, and RTL8153 support 32K bytes. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 18:12:08 -07:00
Jakub Kicinski	e070ca371f	Merge branch 'net-phy-let-phy_speed_down-up-support-speeds-1Gbps' Heiner says: ==================== So far phy_speed_down/up can be used up to 1Gbps only. Remove this restriction and add needed helpers to phy-core.c v2: - remove unused parameter in patch 1 - rename __phy_speed_down to phy_speed_down_core in patch 2 ==================== Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 17:16:11 -07:00
Heiner Kallweit	65b27995a4	net: phy: let phy_speed_down/up support speeds >1Gbps So far phy_speed_down/up can be used up to 1Gbps only. Remove this restriction by using new helper __phy_speed_down. New member adv_old in struct phy_device is used by phy_speed_up to restore the advertised modes before calling phy_speed_down. Don't simply advertise what is supported because a user may have intentionally removed modes from advertisement. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 17:14:06 -07:00
Heiner Kallweit	331c56ac73	net: phy: add phy_speed_down_core and phy_resolve_min_speed phy_speed_down_core provides most of the functionality for phy_speed_down. It makes use of new helper phy_resolve_min_speed that is based on the sorting of the settings[] array. In certain cases it may be helpful to be able to exclude legacy half duplex modes, therefore prepare phy_resolve_min_speed() for it. v2: - rename __phy_speed_down to phy_speed_down_core Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 17:14:06 -07:00
Heiner Kallweit	7b261e0ef5	net: phy: add __set_linkmode_max_speed We will need the functionality of __set_linkmode_max_speed also for linkmode bitmaps other than phydev->supported. Therefore split it. v2: - remove unused parameter from __set_linkmode_max_speed Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 17:14:06 -07:00

1 2 3 4 5 ...

856975 Commits All Branches Search

856975 Commits

All Branches