OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Emmanuel Grumbach	3d2a2544ea	nl80211: vendor-cmd: add Intel vendor commands for iwlmei usage iwlmei allows to integrate with the CSME firmware. There are flows that are prioprietary for this purpose: * Get the information of the AP the CSME firmware is connected to. This is useful when we need to speed up the connection process in case the CSME firmware has a TCP connection that must be kept alive across the ownership transition. * Forbid roaming, which will happen when the CSME firmware wants to tell the user space not disrupt the connection. * Request ownership, upon driver boot when the CSME firmware owns the device. This is a notification sent by the kernel. All those commands are expected to be used by any software managing the connection (mainly NetworkManager). Those commands are expected to be used only in case the CSME firmware owns the device and doesn't want to release the device unless the host made sure that it can keep the connectivity. Here are the steps of the expected flow: 1) The machine boots while AMT has an active TCP connection 2) iwlwifi starts and tries to access the device 3) The device is not available because of the active TCP connection. (If there are no active connections, the CSME firmware would have allowed iwlwifi to use the device) Note that all the steps up to here don't involve iwlmei. All this happens in iwlwifi (in iwl_pcie_prepare_card_hw). 4) iwlmei establishes a connection to the CSME firmware (through SAP) Here iwlwifi uses iwlmei to access the device's capabilities (since it can't touch the device), but this is not relevant for the vendor commands. 5) The CSME firmware tells iwlmei that it uses the NIC and that there is an acitve TCP connection, and hence, the host needs to think twice before asking the CSME firmware to release the device 6) iwlmei tells iwlwifi to report HW RFKILL with a special reason Up to here, there was no user space involved. 7) The user space (NetworkManager) boots and sees that the device is in RFKILL because the host doesn't own the device 8) The user space asks the kernel what AP the CSME firmware is connected to (with the first vendor command mentionned above) 9) The user space checks if it has a profile that matches the reply from the CSME firmware 10) The user space installs a network to the wpa_supplicant with a specific BSSID and a specific frequency 11) The user space prevents any type of full scan 12) The user space asks iwlmei to request ownership on the device (with the third vendor command) 13) iwlmei request ownership from the CSME firmware 14) The CSME firmware grants ownership 15) iwlmei tells iwlwifi to lift the RFKILL 16) RFKILL OFF is reported to userspace 17) The host boots the device, loads the firwmare, and connect to a specific BSSID without scanning including IP in less than 600ms (this is what I measured, of course it depends on many factors) 18) The host reports to the CSME firmware that there is a connection 19) The TCP connection is preserved and the host has now connectivity 20) Later, the TCP connection to the CSME firmware is terminated 21) The CSME firmware tells iwlmei that it is now free to do whatever it likes 22) iwlwifi sends the second vendor command to tell the user space that it can remove the special network configuration and pick any SSID / BSSID it likes. Co-Developed-by: Ayala Beker <ayala.beker@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Link: https://lore.kernel.org/r/20210625081717.7680-4-emmanuel.grumbach@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-08-13 09:50:24 +02:00
Nikolay Aleksandrov	dc002875c2	net: bridge: vlan: use br_rports_fill_info() to export mcast router ports Embed the standard multicast router port export by br_rports_fill_info() into a new global vlan attribute BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS. In order to have the same format for the global bridge mcast context and the per-vlan mcast context we need a double-nesting: - BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS - MDBA_ROUTER Currently we don't compare router lists, if any router port exists in the bridge mcast contexts we consider their option sets as different and export them separately. In addition we export the router port vlan id when dumping similar to the router port notification format. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov	a97df080b6	net: bridge: vlan: add support for mcast router global option Add support to change and retrieve global vlan multicast router state which is used for the bridge itself. We just need to pass multicast context to br_multicast_set_router instead of bridge device and the rest of the logic remains the same. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov	62938182c3	net: bridge: vlan: add support for mcast querier global option Add support to change and retrieve global vlan multicast querier state. We just need to pass multicast context to br_multicast_set_querier instead of bridge device and the rest of the logic remains the same. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov	941121ee22	net: bridge: vlan: add support for mcast startup query interval global option Add support to change and retrieve global vlan multicast startup query interval option. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov	425214508b	net: bridge: vlan: add support for mcast query response interval global option Add support to change and retrieve global vlan multicast query response interval option. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov	d6c08aba4f	net: bridge: vlan: add support for mcast query interval global option Add support to change and retrieve global vlan multicast query interval option. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov	cd9269d463	net: bridge: vlan: add support for mcast querier interval global option Add support to change and retrieve global vlan multicast querier interval option. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov	2da0aea21f	net: bridge: vlan: add support for mcast membership interval global option Add support to change and retrieve global vlan multicast membership interval option. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov	77f6ababa2	net: bridge: vlan: add support for mcast last member interval global option Add support to change and retrieve global vlan multicast last member interval option. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov	50725f6e6b	net: bridge: vlan: add support for mcast startup query count global option Add support to change and retrieve global vlan multicast startup query count option. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov	931ba87d20	net: bridge: vlan: add support for mcast last member count global option Add support to change and retrieve global vlan multicast last member count option. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov	df271cd641	net: bridge: vlan: add support for mcast igmp/mld version global options Add support to change and retrieve global vlan IGMP/MLD versions. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 13:34:41 +01:00
David S. Miller	6f45933dfe	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Use nfnetlink_unicast() instead of netlink_unicast() in nft_compat. 2) Remove call to nf_ct_l4proto_find() in flowtable offload timeout fixup. 3) CLUSTERIP registers ARP hook on demand, from Florian. 4) Use clusterip_net to store pernet warning, also from Florian. 5) Remove struct netns_xt, from Florian Westphal. 6) Enable ebtables hooks in initns on demand, from Florian. 7) Allow to filter conntrack netlink dump per status bits, from Florian Westphal. 8) Register x_tables hooks in initns on demand, from Florian. 9) Remove queue_handler from per-netns structure, again from Florian. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 10:22:26 +01:00
Florian Westphal	9344988d29	netfilter: ctnetlink: allow to filter dump by status bits If CTA_STATUS is present, but CTA_STATUS_MASK is not, then the mask is automatically set to 'status', so that kernel returns those entries that have all of the requested bits set. This makes more sense than using a all-one mask since we'd hardly ever find a match. There are no other checks for status bits, so if e.g. userspace sets impossible combinations it will get an empty dump. If kernel would reject unknown status bits, then a program that works on a future kernel that has IPS_FOO bit fails on old kernels. Same for 'impossible' combinations: Kernel never sets ASSURED without first having set SEEN_REPLY, but its possible that a future kernel could do so. Therefore no sanity tests other than a 0-mask. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-08-05 13:36:39 +02:00
Gustavo A. R. Silva	db243b7964	net/ipv4/ipv6: Replace one-element arraya with flexible-array members There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. Use an anonymous union with a couple of anonymous structs in order to keep userspace unchanged and refactor the related code accordingly: $ pahole -C group_filter net/ipv4/ip_sockglue.o struct group_filter { union { struct { __u32 gf_interface_aux; /* 0 4 / / XXX 4 bytes hole, try to pack / struct __kernel_sockaddr_storage gf_group_aux; / 8 128 / / --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- / __u32 gf_fmode_aux; / 136 4 / __u32 gf_numsrc_aux; / 140 4 / struct __kernel_sockaddr_storage gf_slist[1]; / 144 128 / }; / 0 272 / struct { __u32 gf_interface; / 0 4 / / XXX 4 bytes hole, try to pack / struct __kernel_sockaddr_storage gf_group; / 8 128 / / --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- / __u32 gf_fmode; / 136 4 / __u32 gf_numsrc; / 140 4 / struct __kernel_sockaddr_storage gf_slist_flex[0]; / 144 0 / }; / 0 144 / }; / 0 272 / / size: 272, cachelines: 5, members: 1 / / last cacheline: 16 bytes / }; $ pahole -C compat_group_filter net/ipv4/ip_sockglue.o struct compat_group_filter { union { struct { __u32 gf_interface_aux; / 0 4 / struct __kernel_sockaddr_storage gf_group_aux __attribute__((__aligned__(4))); / 4 128 / / --- cacheline 2 boundary (128 bytes) was 4 bytes ago --- / __u32 gf_fmode_aux; / 132 4 / __u32 gf_numsrc_aux; / 136 4 / struct __kernel_sockaddr_storage gf_slist[1] __attribute__((__aligned__(4))); / 140 128 / } __attribute__((__packed__)) __attribute__((__aligned__(4))); / 0 268 / struct { __u32 gf_interface; / 0 4 / struct __kernel_sockaddr_storage gf_group __attribute__((__aligned__(4))); / 4 128 / / --- cacheline 2 boundary (128 bytes) was 4 bytes ago --- / __u32 gf_fmode; / 132 4 / __u32 gf_numsrc; / 136 4 / struct __kernel_sockaddr_storage gf_slist_flex[0] __attribute__((__aligned__(4))); / 140 0 / } __attribute__((__packed__)) __attribute__((__aligned__(4))); / 0 140 / } __attribute__((__aligned__(1))); / 0 268 / / size: 268, cachelines: 5, members: 1 / / forced alignments: 1 / / last cacheline: 12 bytes */ } __attribute__((__packed__)); This helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy(). [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays Link: https://github.com/KSPP/linux/issues/79 Link: https://github.com/KSPP/linux/issues/109 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-05 11:46:42 +01:00
Pavel Tikhomirov	04190bf894	sock: allow reading and changing sk_userlocks with setsockopt SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags disable automatic socket buffers adjustment done by kernel (see tcp_fixup_rcvbuf() and tcp_sndbuf_expand()). If we've just created a new socket this adjustment is enabled on it, but if one changes the socket buffer size by setsockopt(SO_{SND,RCV}BUF) it becomes disabled. CRIU needs to call setsockopt(SO_{SND,RCV}BUF) on each socket on restore as it first needs to increase buffer sizes for packet queues restore and second it needs to restore back original buffer sizes. So after CRIU restore all sockets become non-auto-adjustable, which can decrease network performance of restored applications significantly. CRIU need to be able to restore sockets with enabled/disabled adjustment to the same state it was before dump, so let's add special setsockopt for it. Let's also export SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags to uAPI so that using these interface one can reenable automatic socket buffer adjustment on their sockets. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-04 12:52:03 +01:00
David S. Miller	9c0532f9cc	linux-can-next-for-5.15-20210804 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAmEKaBUTHG1rbEBwZW5n dXRyb25peC5kZQAKCRCpyVqK+u3vqSvgCACpR64hydl7/qt9QGnm9Ym6/v/L9y9v aBfZMQsedP1GSuev5PpxghXU4GF0LXiDr6ryr0hhu7w2ojjlLNl9sVHCF9qdAJKz x2D4YTlxct2KuPBdhWllQr/KWFbJh2IzarHEWzdo+QoU5A8jDlsK2kLeeikFECzT fVUe3mu1k66/DvHsetsfzIvbUkuHk2SPpK/pwrUC6Siw6wQZBHlSoUEtBNwEPlyH 8+ZQJPqtrjr2v3mZUOkgHrlXEOZRu6OM3i1Yv2bn2x4VI+3KQHEw/cA1WNE2AOzN CfMp4sS98QdCrAboX4VJZpGAbziTFHedqFjjIP9ultCfH9ROHhQj4Zsl =37wt -----END PGP SIGNATURE----- Merge tag 'linux-can-next-for-5.15-20210804' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2021-08-04 this is a pull request of 5 patches for net-next/master. The first patch is by me and fixes a typo in a comment in the CAN J1939 protocol. The next 2 patches are by Oleksij Rempel and update the CAN J1939 protocol to send RX status updates via the error queue mechanism. The next patch is by me and adds a missing variable initialization to the flexcan driver (the problem was introduced in the current net-next cycle). The last patch is by Aswath Govindraju and adds power-domains to the Bosch m_can DT binding documentation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-04 11:30:09 +01:00
Oleksij Rempel	5b9272e93f	can: j1939: extend UAPI to notify about RX status To be able to create applications with user friendly feedback, we need be able to provide receive status information. Typical ETP transfer may take seconds or even hours. To give user some clue or show a progress bar, the stack should push status updates. Same as for the TX information, the socket error queue will be used with following new signals: - J1939_EE_INFO_RX_RTS - received and accepted request to send signal. - J1939_EE_INFO_RX_DPO - received data package offset signal - J1939_EE_INFO_RX_ABORT - RX session was aborted Instead of completion signal, user will get data package. To activate this signals, application should set SOF_TIMESTAMPING_RX_SOFTWARE to the SO_TIMESTAMPING socket option. This will avoid unpredictable application behavior for the old software. Link: https://lore.kernel.org/r/20210707094854.30781-3-o.rempel@pengutronix.de Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-08-04 12:11:52 +02:00
Hangbin Liu	3a755cd8b7	bonding: add new option lacp_active Add an option lacp_active, which is similar with team's runner.active. This option specifies whether to send LACPDU frames periodically. If set on, the LACPDU frames are sent along with the configured lacp_rate setting. If set off, the LACPDU frames acts as "speak when spoken to". Note, the LACPDU state frames still will be sent when init or unbind port. v2: remove module parameter Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-03 11:50:22 +01:00
Gustavo A. R. Silva	2d3e5caf96	net/ipv4: Replace one-element array with flexible-array member There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. Use an anonymous union with a couple of anonymous structs in order to keep userspace unchanged: $ pahole -C ip_msfilter net/ipv4/ip_sockglue.o struct ip_msfilter { union { struct { __be32 imsf_multiaddr_aux; /* 0 4 / __be32 imsf_interface_aux; / 4 4 / __u32 imsf_fmode_aux; / 8 4 / __u32 imsf_numsrc_aux; / 12 4 / __be32 imsf_slist[1]; / 16 4 / }; / 0 20 / struct { __be32 imsf_multiaddr; / 0 4 / __be32 imsf_interface; / 4 4 / __u32 imsf_fmode; / 8 4 / __u32 imsf_numsrc; / 12 4 / __be32 imsf_slist_flex[0]; / 16 0 / }; / 0 16 / }; / 0 20 / / size: 20, cachelines: 1, members: 1 / / last cacheline: 20 bytes */ }; Also, refactor the code accordingly and make use of the struct_size() and flex_array_size() helpers. This helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy(). [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays Link: https://github.com/KSPP/linux/issues/79 Link: https://github.com/KSPP/linux/issues/109 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-02 15:17:28 +01:00
Cong Wang	695176bfe5	net_sched: refactor TC action init API TC action ->init() API has 10 parameters, it becomes harder to read. Some of them are just boolean and can be replaced by flags. Similarly for the internal API tcf_action_init() and tcf_exts_validate(). This patch converts them to flags and fold them into the upper 16 bits of "flags", whose lower 16 bits are still reserved for user-space. More specifically, the following kernel flags are introduced: TCA_ACT_FLAGS_POLICE replace 'name' in a few contexts, to distinguish whether it is compatible with policer. TCA_ACT_FLAGS_BIND replaces 'bind', to indicate whether this action is bound to a filter. TCA_ACT_FLAGS_REPLACE replaces 'ovr' in most contexts, means we are replacing an existing action. TCA_ACT_FLAGS_NO_RTNL replaces 'rtnl_held' but has the opposite meaning, because we still hold RTNL in most cases. The only user-space flag TCA_ACT_FLAGS_NO_PERCPU_STATS is untouched and still stored as before. I have tested this patch with tdc and I do not see any failure related to this patch. Tested-by: Vlad Buslov <vladbu@nvidia.com> Acked-by: Jamal Hadi Salim<jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-02 10:24:38 +01:00
Jakub Kicinski	d2e11fd2b7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicting commits, all resolutions pretty trivial: drivers/bus/mhi/pci_generic.c `5c2c853159` ("bus: mhi: pci-generic: configurable network interface MRU") `56f6f4c4eb` ("bus: mhi: pci_generic: Apply no-op for wake using sideband wake boolean") drivers/nfc/s3fwrn5/firmware.c `a0302ff590` ("nfc: s3fwrn5: remove unnecessary label") `46573e3ab0` ("nfc: s3fwrn5: fix undefined parameter values in dev_err()") `801e541c79` ("nfc: s3fwrn5: fix undefined parameter values in dev_err()") MAINTAINERS `7d901a1e87` ("net: phy: add Maxlinear GPY115/21x/24x driver") `8a7b46fa79` ("MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-07-31 09:14:46 -07:00
Matt Johnston	03f2bbc4ee	mctp: Allow per-netns default networks Currently we have a compile-time default network (MCTP_INITIAL_DEFAULT_NET). This change introduces a default_net field on the net namespace, allowing future configuration for new interfaces. Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-29 15:06:50 +01:00
Jeremy Kerr	583be982d9	mctp: Add device handling and netlink interface This change adds the infrastructure for managing MCTP netdevices; we add a pointer to the AF_MCTP-specific data to struct netdevice, and hook up the rtnetlink operations for adding and removing addresses. Includes changes from Matt Johnston <matt@codeconstruct.com.au>. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-29 15:06:50 +01:00
Jeremy Kerr	4b2e69305c	mctp: Add initial driver infrastructure Add an empty drivers/net/mctp/, for future interface drivers. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-29 15:06:50 +01:00
Jeremy Kerr	60fc639816	mctp: Add sockaddr_mctp to uapi This change introduces the user-visible MCTP header, containing the protocol-specific addressing definitions. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-29 15:06:50 +01:00
Jeremy Kerr	bc49d8169a	mctp: Add MCTP base Add basic Kconfig, an initial (empty) af_mctp source object, and {AF,PF}_MCTP definitions, and the required definitions for a new protocol type. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-29 15:06:49 +01:00
Tony Luck	25905f602f	dmaengine: idxd: Change license on idxd.h to LGPL This file was given GPL-2.0 license. But LGPL-2.1 makes more sense as it needs to be used by libraries outside of the kernel source tree. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-07-28 10:22:43 -07:00
Peilin Ye	56af5e749f	net/sched: act_skbmod: Add SKBMOD_F_ECN option support Currently, when doing rate limiting using the tc-police(8) action, the easiest way is to simply drop the packets which exceed or conform the configured bandwidth limit. Add a new option to tc-skbmod(8), so that users may use the ECN [1] extension to explicitly inform the receiver about the congestion instead of dropping packets "on the floor". The 2 least significant bits of the Traffic Class field in IPv4 and IPv6 headers are used to represent different ECN states [2]: 0b00: "Non ECN-Capable Transport", Non-ECT 0b10: "ECN Capable Transport", ECT(0) 0b01: "ECN Capable Transport", ECT(1) 0b11: "Congestion Encountered", CE As an example: $ tc filter add dev eth0 parent 1: protocol ip prio 10 \ matchall action skbmod ecn Doing the above marks all ECT(0) and ECT(1) packets as CE. It does NOT affect Non-ECT or non-IP packets. In the tc-police scenario mentioned above, users may pipe a tc-police action and a tc-skbmod "ecn" action together to achieve ECN-based rate limiting. For TCP connections, upon receiving a CE packet, the receiver will respond with an ECE packet, asking the sender to reduce their congestion window. However ECN also works with other L4 protocols e.g. DCCP and SCTP [2], and our implementation does not touch or care about L4 headers. The updated tc-skbmod SYNOPSIS looks like the following: tc ... action skbmod { set SETTABLE \| swap SWAPPABLE \| ecn } ... Only one of "set", "swap" or "ecn" shall be used in a single tc-skbmod command. Trying to use more than one of them at a time is considered undefined behavior; pipe multiple tc-skbmod commands together instead. "set" and "swap" only affect Ethernet packets, while "ecn" only affects IPv{4,6} packets. It is also worth mentioning that, in theory, the same effect could be achieved by piping a "police" action and a "bpf" action using the bpf_skb_ecn_set_ce() helper, but this requires eBPF programming from the user, thus impractical. Depends on patch "net/sched: act_skbmod: Skip non-Ethernet packets". [1] https://datatracker.ietf.org/doc/html/rfc3168 [2] https://en.wikipedia.org/wiki/Explicit_Congestion_Notification Reviewed-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-28 13:19:31 +01:00
Linus Torvalds	7d549995d4	RDMA v5.14 first rc Pull Request - Many more irdma fixups from bots/etc - bnxt_re regression in their counters from a FW upgrade - User triggerable memory leak in rxe -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmD/PpEACgkQOG33FX4g mxouhBAAgpjHUgALVAcYPyqXyAqr9KLIv6HB79OWP5W8I+XDDAbedoIJy3FWWm1d z5l0kBwUCH216KyNrXlikqBM61dluA8N9J7lPCMOolU0dLMaC0MeXCBir2VV4NU9 lSVasmrBPnZFWk38VgTC4rZr4lt+JLyKYwg7BBozCoDhcX/TuYv2HFB9YCJ9Vv+D u50K1Aj7/fhXfInDuIcICHfvE0d2YaN0+apEQ/Mk11vVGcxxjv/mYraGyFqZkN0M 6yuNZXXIPj9x/gPY8kWp4mBipY6W/cnjLzUEeKQem2rOXpS1baxK9PZlW+X8qlMq P+9s6EGuxDWcYfU1VuTBwmuM91/kKch6nYtmhHPTv0Wrk/fFZpXM8kD+mi8Vbg4c aE+0DfpLLnpyu8/BKNmO7s0STT2Ea0VCrlWlGwjg93Y53qCwFsexg/HXO27U8DS2 m2f0DKuGvIHlaUW7OaNADmAzpQ3NikVohBL9hAApNbgjBN8xrifSGdiuUbJ6MU0l a207yVSQUc96eS+fDJRAngUgYEoO9o1JVeHXGaMs6Rwpu5iSHk1wpQTcQ7vTSeJP CNjPWAoFe/Jv/S4r0E2s43u0ajV22G6NyWVUQ+eIb8BD5J7fTDn6466vnH8RyQNY OgwXaYnoadC27SDVr9o6LyLNJYyqUnA1nmZcLFl8gE0LenxyldU= =g54k -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma fixes from Jason Gunthorpe: "Nothing very exciting here, mainly just a bunch of irdma fixes. irdma is a new driver this cycle so it to be expected. - Many more irdma fixups from bots/etc - bnxt_re regression in their counters from a FW upgrade - User triggerable memory leak in rxe" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/irdma: Change returned type of irdma_setup_virt_qp to void RDMA/irdma: Change the returned type of irdma_set_hw_rsrc to void RDMA/irdma: change the returned type of irdma_sc_repost_aeq_entries to void RDMA/irdma: Check vsi pointer before using it RDMA/rxe: Fix memory leak in error path code RDMA/irdma: Change the returned type to void RDMA/irdma: Make spdxcheck.py happy RDMA/irdma: Fix unused variable total_size warning RDMA/bnxt_re: Fix stats counters	2021-07-27 14:13:33 -07:00
Mark Gray	784dcfa56e	openvswitch: fix alignment issues Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-27 11:48:42 +01:00
Mark Gray	e4252cb666	openvswitch: update kdoc OVS_DP_ATTR_PER_CPU_PIDS Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-27 11:48:42 +01:00
Justin Iurman	3edede08ff	ipv6: ioam: Support for IOAM injection with lwtunnels Add support for the IOAM inline insertion (only for the host-to-host use case) which is per-route configured with lightweight tunnels. The target is iproute2 and the patch is ready. It will be posted as soon as this patchset is merged. Here is an overview: $ ip -6 ro ad fc00::1/128 encap ioam6 trace type 0x800000 ns 1 size 12 dev eth0 This example configures an IOAM Pre-allocated Trace option attached to the fc00::1/128 prefix. The IOAM namespace (ns) is 1, the size of the pre-allocated trace data block is 12 octets (size) and only the first IOAM data (bit 0: hop_limit + node id) is included in the trace (type) represented as a bitfield. The reason why the in-transit (IPv6-in-IPv6 encapsulation) use case is not implemented is explained on the patchset cover. Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 08:14:33 -07:00
Justin Iurman	8c6f6fa677	ipv6: ioam: IOAM Generic Netlink API Add Generic Netlink commands to allow userspace to configure IOAM namespaces and schemas. The target is iproute2 and the patch is ready. It will be posted as soon as this patchset is merged. Here is an overview: $ ip ioam Usage: ip ioam { COMMAND \| help } ip ioam namespace show ip ioam namespace add ID [ data DATA32 ] [ wide DATA64 ] ip ioam namespace del ID ip ioam schema show ip ioam schema add ID DATA ip ioam schema del ID ip ioam namespace set ID schema { ID \| none } Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 08:14:33 -07:00
Justin Iurman	9ee11f0fff	ipv6: ioam: Data plane support for Pre-allocated Trace Implement support for processing the IOAM Pre-allocated Trace with IPv6, see [1] and [2]. Introduce a new IPv6 Hop-by-Hop TLV option, see IANA [3]. A new per-interface sysctl is introduced. The value is a boolean to accept (=1) or ignore (=0, by default) IPv6 IOAM options on ingress for an interface: - net.ipv6.conf.XXX.ioam6_enabled Two other sysctls are introduced to define IOAM IDs, represented by an integer. They are respectively per-namespace and per-interface: - net.ipv6.ioam6_id - net.ipv6.conf.XXX.ioam6_id The value of the first one represents the IOAM ID of the node itself (u32; max and default value = U32_MAX>>8, due to hop limit concatenation) while the other represents the IOAM ID of an interface (u16; max and default value = U16_MAX). Each "ioam6_id" sysctl has a "_wide" equivalent: - net.ipv6.ioam6_id_wide - net.ipv6.conf.XXX.ioam6_id_wide The value of the first one represents the wide IOAM ID of the node itself (u64; max and default value = U64_MAX>>8, due to hop limit concatenation) while the other represents the wide IOAM ID of an interface (u32; max and default value = U32_MAX). The use of short and wide equivalents is not exclusive, a deployment could choose to leverage both. For example, net.ipv6.conf.XXX.ioam6_id (short format) could be an identifier for a physical interface, whereas net.ipv6.conf.XXX.ioam6_id_wide (wide format) could be an identifier for a logical sub-interface. Documentation about new sysctls is provided at the end of this patchset. Two relativistic hash tables are used: one for IOAM namespaces, the other for IOAM schemas. A namespace can only have a single active schema and a schema can only be attached to a single namespace (1:1 relationship). [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data [3] https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2 Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 08:14:33 -07:00
Justin Iurman	db67f219fc	uapi: IPv6 IOAM headers definition This patch provides the IPv6 IOAM option header [1] as well as the IOAM Trace header [2]. An IOAM option must be 4n-aligned. Here is an overview of a Hop-by-Hop with an IOAM Trace option: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \| Next header \| Hdr Ext Len \| Padding \| Padding \| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \| Option Type \| Opt Data Len \| Reserved \| IOAM Type \| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \| Namespace-ID \| NodeLen \| Flags \| RemainingLen\| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \| IOAM-Trace-Type \| Reserved \| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<-+ \| \| \| \| node data [n] \| \| \| \| \| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ D \| \| a \| node data [n-1] \| t \| \| a +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ ... ~ S +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ p \| \| a \| node data [1] \| c \| \| e +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \| \| \| \| \| node data [0] \| \| \| \| \| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<-+ The IOAM option header starts at "Option Type" and ends after "IOAM Type". The IOAM Trace header starts at "Namespace-ID" and ends after "IOAM-Trace-Type/Reserved". IOAM Type: either Pre-allocated Trace (=0), Incremental Trace (=1), Proof-of-Transit (=2) or Edge-to-Edge (=3). Note that both the Pre-allocated Trace and the Incremental Trace look the same. The two others are not implemented. Namespace-ID: IOAM namespace identifier, not to be confused with network namespaces. It adds further context to IOAM options and associated data, and allows devices which are IOAM capable to determine whether IOAM options must be processed or ignored. It can also be used by an operator to distinguish different operational domains or to identify different sets of devices. NodeLen: Length of data added by each node. It depends on the Trace Type. Flags: Only the Overflow (O) flag for now. The O flag is set by a transit node when there are not enough octets left to record its data. RemainingLen: Remaining free space to record data. IOAM-Trace-Type: Bit field where each bit corresponds to a specific kind of IOAM data. See [2] for a detailed list. [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 08:14:33 -07:00
Nikolay Aleksandrov	9dee572c38	net: bridge: vlan: add mcast snooping control Add a new global vlan option which controls whether multicast snooping is enabled or disabled for a single vlan. It controls the vlan private flag: BR_VLFLAG_GLOBAL_MCAST_ENABLED. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 05:41:20 -07:00
Nikolay Aleksandrov	743a53d963	net: bridge: vlan: add support for dumping global vlan options Add a new vlan options dump flag which causes only global vlan options to be dumped. The dumps are done only with bridge devices, ports are ignored. They support vlan compression if the options in sequential vlans are equal (currently always true). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 05:41:20 -07:00
Nikolay Aleksandrov	47ecd2dbd8	net: bridge: vlan: add support for global options We can have two types of vlan options depending on context: - per-device vlan options (split in per-bridge and per-port) - global vlan options The second type wasn't supported in the bridge until now, but we need them for per-vlan multicast support, per-vlan STP support and other options which require global vlan context. They are contained in the global bridge vlan context even if the vlan is not configured on the bridge device itself. This patch adds initial netlink attributes and support for setting these global vlan options, they can only be set (RTM_NEWVLAN) and the operation must use the bridge device. Since there are no such options yet it shouldn't have any functional effect. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 05:41:20 -07:00
Nikolay Aleksandrov	1e9ca45662	net: bridge: multicast: include router port vlan id in notifications Use the port multicast context to check if the router port is a vlan and in case it is include its vlan id in the notification. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 05:41:20 -07:00
Nikolay Aleksandrov	f4b7002a70	net: bridge: add vlan mcast snooping knob Add a global knob that controls if vlan multicast snooping is enabled. The proper contexts (vlan or bridge-wide) will be chosen based on the knob when processing packets and changing bridge device state. Note that vlans have their individual mcast snooping enabled by default, but this knob is needed to turn on bridge vlan snooping. It is disabled by default. To enable the knob vlan filtering must also be enabled, it doesn't make sense to have vlan mcast snooping without vlan filtering since that would lead to inconsistencies. Disabling vlan filtering will also automatically disable vlan mcast snooping. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 05:41:20 -07:00
Mark Gray	b83d23a2a3	openvswitch: Introduce per-cpu upcall dispatch The Open vSwitch kernel module uses the upcall mechanism to send packets from kernel space to user space when it misses in the kernel space flow table. The upcall sends packets via a Netlink socket. Currently, a Netlink socket is created for every vport. In this way, there is a 1:1 mapping between a vport and a Netlink socket. When a packet is received by a vport, if it needs to be sent to user space, it is sent via the corresponding Netlink socket. This mechanism, with various iterations of the corresponding user space code, has seen some limitations and issues: * On systems with a large number of vports, there is a correspondingly large number of Netlink sockets which can limit scaling. (https://bugzilla.redhat.com/show_bug.cgi?id=1526306) * Packet reordering on upcalls. (https://bugzilla.redhat.com/show_bug.cgi?id=1844576) * A thundering herd issue. (https://bugzilla.redhat.com/show_bug.cgi?id=1834444) This patch introduces an alternative, feature-negotiated, upcall mode using a per-cpu dispatch rather than a per-vport dispatch. In this mode, the Netlink socket to be used for the upcall is selected based on the CPU of the thread that is executing the upcall. In this way, it resolves the issues above as: a) The number of Netlink sockets scales with the number of CPUs rather than the number of vports. b) Ordering per-flow is maintained as packets are distributed to CPUs based on mechanisms such as RSS and flows are distributed to a single user space thread. c) Packets from a flow can only wake up one user space thread. The corresponding user space code can be found at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385139.html Bugzilla: https://bugzilla.redhat.com/1844576 Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 11:06:33 -07:00
David S. Miller	82a1ffe57e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-07-15 The following pull-request contains BPF updates for your net-next tree. We've added 45 non-merge commits during the last 15 day(s) which contain a total of 52 files changed, 3122 insertions(+), 384 deletions(-). The main changes are: 1) Introduce bpf timers, from Alexei. 2) Add sockmap support for unix datagram socket, from Cong. 3) Fix potential memleak and UAF in the verifier, from He. 4) Add bpf_get_func_ip helper, from Jiri. 5) Improvements to generic XDP mode, from Kumar. 6) Support for passing xdp_md to XDP programs in bpf_prog_run, from Zvi. =================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-15 22:40:10 -07:00
Jiri Olsa	9ffd9f3ff7	bpf: Add bpf_get_func_ip helper for kprobe programs Adding bpf_get_func_ip helper for BPF_PROG_TYPE_KPROBE programs, so it's now possible to call bpf_get_func_ip from both kprobe and kretprobe programs. Taking the caller's address from 'struct kprobe::addr', which is defined for both kprobe and kretprobe. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/bpf/20210714094400.396467-5-jolsa@kernel.org	2021-07-15 17:59:09 -07:00
Jiri Olsa	9b99edcae5	bpf: Add bpf_get_func_ip helper for tracing programs Adding bpf_get_func_ip helper for BPF_PROG_TYPE_TRACING programs, specifically for all trampoline attach types. The trampoline's caller IP address is stored in (ctx - 8) address. so there's no reason to actually call the helper, but rather fixup the call instruction and return [ctx - 8] value directly. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210714094400.396467-4-jolsa@kernel.org	2021-07-15 17:58:41 -07:00
Alexei Starovoitov	b00628b1c7	bpf: Introduce bpf timers. Introduce 'struct bpf_timer { __u64 :64; __u64 :64; };' that can be embedded in hash/array/lru maps as a regular field and helpers to operate on it: // Initialize the timer. // First 4 bits of 'flags' specify clockid. // Only CLOCK_MONOTONIC, CLOCK_REALTIME, CLOCK_BOOTTIME are allowed. long bpf_timer_init(struct bpf_timer timer, struct bpf_map map, int flags); // Configure the timer to call 'callback_fn' static function. long bpf_timer_set_callback(struct bpf_timer timer, void callback_fn); // Arm the timer to expire 'nsec' nanoseconds from the current time. long bpf_timer_start(struct bpf_timer timer, u64 nsec, u64 flags); // Cancel the timer and wait for callback_fn to finish if it was running. long bpf_timer_cancel(struct bpf_timer timer); Here is how BPF program might look like: struct map_elem { int counter; struct bpf_timer timer; }; struct { __uint(type, BPF_MAP_TYPE_HASH); __uint(max_entries, 1000); __type(key, int); __type(value, struct map_elem); } hmap SEC(".maps"); static int timer_cb(void map, int key, struct map_elem val); / val points to particular map element that contains bpf_timer. / SEC("fentry/bpf_fentry_test1") int BPF_PROG(test1, int a) { struct map_elem val; int key = 0; val = bpf_map_lookup_elem(&hmap, &key); if (val) { bpf_timer_init(&val->timer, &hmap, CLOCK_REALTIME); bpf_timer_set_callback(&val->timer, timer_cb); bpf_timer_start(&val->timer, 1000 /* call timer_cb2 in 1 usec */, 0); } } This patch adds helper implementations that rely on hrtimers to call bpf functions as timers expire. The following patches add necessary safety checks. Only programs with CAP_BPF are allowed to use bpf_timer. The amount of timers used by the program is constrained by the memcg recorded at map creation time. The bpf_timer_init() helper needs explicit 'map' argument because inner maps are dynamic and not known at load time. While the bpf_timer_set_callback() is receiving hidden 'aux->prog' argument supplied by the verifier. The prog pointer is needed to do refcnting of bpf program to make sure that program doesn't get freed while the timer is armed. This approach relies on "user refcnt" scheme used in prog_array that stores bpf programs for bpf_tail_call. The bpf_timer_set_callback() will increment the prog refcnt which is paired with bpf_timer_cancel() that will drop the prog refcnt. The ops->map_release_uref is responsible for cancelling the timers and dropping prog refcnt when user space reference to a map reaches zero. This uref approach is done to make sure that Ctrl-C of user space process will not leave timers running forever unless the user space explicitly pinned a map that contained timers in bpffs. bpf_timer_init() and bpf_timer_set_callback() will return -EPERM if map doesn't have user references (is not held by open file descriptor from user space and not pinned in bpffs). The bpf_map_delete_elem() and bpf_map_update_elem() operations cancel and free the timer if given map element had it allocated. "bpftool map update" command can be used to cancel timers. The 'struct bpf_timer' is explicitly __attribute__((aligned(8))) because '__u64 :64' has 1 byte alignment of 8 byte padding. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210715005417.78572-4-alexei.starovoitov@gmail.com	2021-07-15 22:31:10 +02:00
Kuniyuki Iwashima	f170acda7f	bpf: Fix a typo of reuseport map in bpf.h. Fix s/BPF_MAP_TYPE_REUSEPORT_ARRAY/BPF_MAP_TYPE_REUSEPORT_SOCKARRAY/ typo in bpf.h. Fixes: `2dbb9b9e6d` ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210714124317.67526-1-kuniyu@amazon.co.jp	2021-07-14 18:12:05 -07:00
Linus Torvalds	8096acd744	Networking fixes for 5.14-rc2, including fixes from bpf and netfilter. Current release - regressions: - sock: fix parameter order in sock_setsockopt() Current release - new code bugs: - netfilter: nft_last: - fix incorrect arithmetic when restoring last used - honor NFTA_LAST_SET on restoration Previous releases - regressions: - udp: properly flush normal packet at GRO time - sfc: ensure correct number of XDP queues; don't allow enabling the feature if there isn't sufficient resources to Tx from any CPU - dsa: sja1105: fix address learning getting disabled on the CPU port - mptcp: addresses a rmem accounting issue that could keep packets in subflow receive buffers longer than necessary, delaying MPTCP-level ACKs - ip_tunnel: fix mtu calculation for ETHER tunnel devices - do not reuse skbs allocated from skbuff_fclone_cache in the napi skb cache, we'd try to return them to the wrong slab cache - tcp: consistently disable header prediction for mptcp Previous releases - always broken: - bpf: fix subprog poke descriptor tracking use-after-free - ipv6: - allocate enough headroom in ip6_finish_output2() in case iptables TEE is used - tcp: drop silly ICMPv6 packet too big messages to avoid expensive and pointless lookups (which may serve as a DDOS vector) - make sure fwmark is copied in SYNACK packets - fix 'disable_policy' for forwarded packets (align with IPv4) - netfilter: conntrack: do not renew entry stuck in tcp SYN_SENT state - netfilter: conntrack: do not mark RST in the reply direction coming after SYN packet for an out-of-sync entry - mptcp: cleanly handle error conditions with MP_JOIN and syncookies - mptcp: fix double free when rejecting a join due to port mismatch - validate lwtstate->data before returning from skb_tunnel_info() - tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path - mt76: mt7921: continue to probe driver when fw already downloaded - bonding: fix multiple issues with offloading IPsec to (thru?) bond - stmmac: ptp: fix issues around Qbv support and setting time back - bcmgenet: always clear wake-up based on energy detection Misc: - sctp: move 198 addresses from unusable to private scope - ptp: support virtual clocks and timestamping - openvswitch: optimize operation for key comparison -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmDu3mMACgkQMUZtbf5S Irsjxg//UwcPJMYFmXV+fGkEsWYe1Kf29FcUDEeANFtbltfAcIfZ0GoTbSDRnrVb HcYAKcm4XRx5bWWdQrQsQq/yiLbnS/rSLc7VRB+uRHWRKl3eYcaUB2rnCXsxrjGw wQJgOmztDCJS4BIky24iQpF/8lg7p/Gj2Ih532gh93XiYo612FrEJKkYb2/OQfYX GkbnZ0kL2Y1SV+bhy6aT5azvhHKM4/3eA4fHeJ2p8e2gOZ5ni0vpX0xEzdzKOCd0 vwR/Wu3h/+2QuFYVcSsVguuM++JXACG8MAS/Tof78dtNM4a3kQxzqeh5Bv6IkfTu rokENLq4pjNRy+nBAOeQZj8Jd0K0kkf/PN9WMdGQtplMoFhjjV25R6PeRrV9wwPo peozIz2MuQo7Kfof1D+44h2foyLfdC28/Z0CvRbDpr5EHOfYynvBbrnhzIGdQp6V xgftKTOdgz2Djgg8HiblZund1FA44OYerddVAASrIsnSFnIz1VLVQIsfV+GLBwwc FawrIZ6WfIjzRSrDGOvDsbAQI47T/1jbaPJeK6XgjWkQmjEd6UtRWRZLYCxemQEw 4HP3sWC96BOehuD8ylipVE1oFqrxCiOB/fZxezXqjo8dSX3NLdak4cCHTHoW5SuZ eEAxQRaBliKd+P7hoy9cZ57CAu3zUa8kijfM5QRlCAHF+zSxaPs= =QFnb -----END PGP SIGNATURE----- Merge tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski. "Including fixes from bpf and netfilter. Current release - regressions: - sock: fix parameter order in sock_setsockopt() Current release - new code bugs: - netfilter: nft_last: - fix incorrect arithmetic when restoring last used - honor NFTA_LAST_SET on restoration Previous releases - regressions: - udp: properly flush normal packet at GRO time - sfc: ensure correct number of XDP queues; don't allow enabling the feature if there isn't sufficient resources to Tx from any CPU - dsa: sja1105: fix address learning getting disabled on the CPU port - mptcp: addresses a rmem accounting issue that could keep packets in subflow receive buffers longer than necessary, delaying MPTCP-level ACKs - ip_tunnel: fix mtu calculation for ETHER tunnel devices - do not reuse skbs allocated from skbuff_fclone_cache in the napi skb cache, we'd try to return them to the wrong slab cache - tcp: consistently disable header prediction for mptcp Previous releases - always broken: - bpf: fix subprog poke descriptor tracking use-after-free - ipv6: - allocate enough headroom in ip6_finish_output2() in case iptables TEE is used - tcp: drop silly ICMPv6 packet too big messages to avoid expensive and pointless lookups (which may serve as a DDOS vector) - make sure fwmark is copied in SYNACK packets - fix 'disable_policy' for forwarded packets (align with IPv4) - netfilter: conntrack: - do not renew entry stuck in tcp SYN_SENT state - do not mark RST in the reply direction coming after SYN packet for an out-of-sync entry - mptcp: cleanly handle error conditions with MP_JOIN and syncookies - mptcp: fix double free when rejecting a join due to port mismatch - validate lwtstate->data before returning from skb_tunnel_info() - tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path - mt76: mt7921: continue to probe driver when fw already downloaded - bonding: fix multiple issues with offloading IPsec to (thru?) bond - stmmac: ptp: fix issues around Qbv support and setting time back - bcmgenet: always clear wake-up based on energy detection Misc: - sctp: move 198 addresses from unusable to private scope - ptp: support virtual clocks and timestamping - openvswitch: optimize operation for key comparison" * tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (158 commits) net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave() sfc: add logs explaining XDP_TX/REDIRECT is not available sfc: ensure correct number of XDP queues sfc: fix lack of XDP TX queues - error XDP TX failed (-22) net: fddi: fix UAF in fza_probe net: dsa: sja1105: fix address learning getting disabled on the CPU port net: ocelot: fix switchdev objects synced for wrong netdev with LAG offload net: Use nlmsg_unicast() instead of netlink_unicast() octeontx2-pf: Fix uninitialized boolean variable pps ipv6: allocate enough headroom in ip6_finish_output2() net: hdlc: rename 'mod_init' & 'mod_exit' functions to be module-specific net: bridge: multicast: fix MRD advertisement router port marking race net: bridge: multicast: fix PIM hello router port marking race net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340 dsa: fix for_each_child.cocci warnings virtio_net: check virtqueue_add_sgs() return value mptcp: properly account bulk freed memory selftests: mptcp: fix case multiple subflows limited by server mptcp: avoid processing packet if a subflow reset mptcp: fix syncookie process if mptcp can not_accept new subflow ...	2021-07-14 09:24:32 -07:00
Lukas Bulwahn	514305ee0a	RDMA/irdma: Make spdxcheck.py happy Commit `48d6b3336a` ("RDMA/irdma: Add ABI definitions") adds ./include/uapi/rdma/irdma-abi.h with an additional unneeded closing bracket at the end of the SPDX-License-Identifier line. Hence, ./scripts/spdxcheck.py complains: include/uapi/rdma/irdma-abi.h: 1:77 Syntax error: ) Remove that closing bracket to make spdxcheck.py happy. Fixes: `48d6b3336a` ("RDMA/irdma: Add ABI definitions") Link: https://lore.kernel.org/r/20210701104127.1877-1-lukas.bulwahn@gmail.com Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Acked-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-07-12 14:52:25 -03:00

1 2 3 4 5 ...

10512 Commits