linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Phil Sutter	5f48846daf	netfilter: nf_tables: Enable fast nft_cmp for inverted matches Add a boolean indicating NFT_CMP_NEQ. To include it into the match decision, it is sufficient to XOR it with the data comparison's result. While being at it, store the mask that is calculated during expression init and free the eval routine from having to recalculate it each time. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-10-04 21:08:32 +02:00
Florian Westphal	ab6c41eefd	netfilter: nfnetlink: place subsys mutexes in distinct lockdep classes From time to time there are lockdep reports similar to this one: WARNING: possible circular locking dependency detected ------------------------------------------------------ 000000004f61aa56 (&table[i].mutex){+.+.}, at: nfnl_lock [nfnetlink] but task is already holding lock: [..] (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid [nf_tables] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&net->nft.commit_mutex){+.+.}: [..] nf_tables_valid_genid+0x18/0x60 [nf_tables] nfnetlink_rcv_batch+0x24c/0x620 [nfnetlink] nfnetlink_rcv+0x110/0x140 [nfnetlink] netlink_unicast+0x12c/0x1e0 [..] sys_sendmsg+0x18/0x40 linux_sparc_syscall+0x34/0x44 -> #0 (&table[i].mutex){+.+.}: [..] nfnl_lock+0x24/0x40 [nfnetlink] ip_set_nfnl_get_byindex+0x19c/0x280 [ip_set] set_match_v1_checkentry+0x14/0xc0 [xt_set] xt_check_match+0x238/0x260 [x_tables] __nft_match_init+0x160/0x180 [nft_compat] [..] sys_sendmsg+0x18/0x40 linux_sparc_syscall+0x34/0x44 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&net->nft.commit_mutex); lock(&table[i].mutex); lock(&net->nft.commit_mutex); lock(&table[i].mutex); Lockdep considers this an ABBA deadlock because the different nfnl subsys mutexes reside in the same lockdep class, but this is a false positive. CPU1 table[i] refers to the nftables subsys mutex, whereas CPU1 locks the ipset subsys mutex. Yi Che reported a similar lockdep splat, this time between ipset and ctnetlink subsys mutexes. Time to place them in distinct classes to avoid these warnings. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-10-04 21:08:32 +02:00
Vasily Averin	9446ab34ac	netfilter: ipset: enable memory accounting for ipset allocations Currently netadmin inside non-trusted container can quickly allocate whole node's memory via request of huge ipset hashtable. Other ipset-related memory allocations should be restricted too. v2: fixed typo ALLOC -> ACCOUNT Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-10-04 21:08:25 +02:00
YueHaibing	82ec6630f9	netfilter: nf_tables_offload: Remove unused macro FLOW_SETUP_BLOCK commit `9a32669fec` ("netfilter: nf_tables_offload: support indr block call") left behind this. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-10-04 21:07:55 +02:00
Matthieu Baerts	456afe01b1	mptcp: ADD_ADDRs with echo bit are smaller The MPTCP ADD_ADDR suboption with echo-flag=1 has no HMAC, the size is smaller than the one initially sent without echo-flag=1. We then need to use the correct size everywhere when we need this echo bit. Before this patch, the wrong size was reserved but the correct amount of bytes were written (and read): the remaining bytes contained garbage. Fixes: `6a6c05a8b0` ("mptcp: send out ADD_ADDR with echo flag") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/95 Reported-and-tested-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-03 17:36:37 -07:00
Guillaume Nault	a45294af9e	net/sched: act_mpls: Add action to push MPLS LSE before Ethernet header Define the MAC_PUSH action which pushes an MPLS LSE before the mac header (instead of between the mac and the network headers as the plain PUSH action does). The only special case is when the skb has an offloaded VLAN. In that case, it has to be inlined before pushing the MPLS header. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-03 17:28:45 -07:00
Guillaume Nault	19fbcb36a3	net/sched: act_vlan: Add {POP,PUSH}_ETH actions Implement TCA_VLAN_ACT_POP_ETH and TCA_VLAN_ACT_PUSH_ETH, to respectively pop and push a base Ethernet header at the beginning of a frame. POP_ETH is just a matter of pulling ETH_HLEN bytes. VLAN tags, if any, must be stripped before calling POP_ETH. PUSH_ETH is restricted to skbs with no mac_header, and only the MAC addresses can be configured. The Ethertype is automatically set from skb->protocol. These restrictions ensure that all skb's fields remain consistent, so that this action can't confuse other part of the networking stack (like GSO). Since openvswitch already had these actions, consolidate the code in skbuff.c (like for vlan and mpls push/pop). Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-03 17:28:45 -07:00
Karsten Graul	fd6ebb6fb2	net/smc: use an array to check fields in system EID The check for old hardware versions that did not have SMCDv2 support was using suspicious pointer magic. Address the fields using an array. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-03 17:04:48 -07:00
Karsten Graul	839d696ffb	net/smc: send ISM devices with unique chid in CLC proposal When building a CLC proposal message then the list of ISM devices does not need to contain multiple devices that have the same chid value, all these devices use the same function at the end. Improve smc_find_ism_v2_device_clnt() to collect only ISM devices that have unique chid values. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-03 17:04:48 -07:00
Yuchung Cheng	9cd8b6c905	tcp: account total lost packets properly The retransmission refactoring patch `686989700c` ("tcp: simplify tcp_mark_skb_lost") does not properly update the total lost packet counter which may break the policer mode in BBR. This patch fixes it. Fixes: `686989700c` ("tcp: simplify tcp_mark_skb_lost") Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-03 16:57:18 -07:00
Julian Wiedmann	a29f245ec9	net/iucv: fix indentation in __iucv_message_receive() smatch complains about net/iucv/iucv.c:1119 __iucv_message_receive() warn: inconsistent indenting While touching this line, also make the return logic consistent and thus get rid of a goto label. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-03 16:51:07 -07:00
Julian Wiedmann	398999bac6	net/af_iucv: right-size the uid variable in iucv_sock_bind() smatch complains about net/iucv/af_iucv.c:624 iucv_sock_bind() error: memcpy() 'sa->siucv_user_id' too small (8 vs 9) Which is absolutely correct - the memcpy() takes 9 bytes (sizeof(uid)) from an 8-byte field (sa->siucv_user_id). Luckily the sockaddr_iucv struct contains more data after the .siucv_user_id field, and we checked the size of the passed data earlier on. So the memcpy() won't accidentally read from an invalid location. Fix the warning by reducing the size of the uid variable to what's actually needed, and thus reducing the amount of copied data. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-03 16:51:07 -07:00
Jakub Kicinski	e992a6eda9	genetlink: allow dumping command-specific policy Right now CTRL_CMD_GETPOLICY can only dump the family-wide policy. Support dumping policy of a specific op. v3: - rebase after per-op policy export and handle that v2: - make cmd U32, just in case. v1: - don't echo op in the output in a naive way, this should make it cleaner to extend the output format for dumping policies for all the commands at once in the future. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20201001225933.1373426-11-kuba@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-03 14:18:29 -07:00
Johannes Berg	50a896cf2d	genetlink: properly support per-op policy dumping Add support for per-op policy dumping. The data is pretty much as before, except that now the assumption that the policy with index 0 is "the" policy no longer holds - you now need to look at the new CTRL_ATTR_OP_POLICY attribute which is a nested attr (indexed by op) containing attributes for do and dump policies. When a single op is requested, the CTRL_ATTR_OP_POLICY will be added in the same way, since do and dump policies may differ. v2: - conditionally advertise per-command policies only if there actually is a policy being used for the do/dump and it's present at all Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-03 14:18:29 -07:00
Johannes Berg	aa85ee5f95	genetlink: factor skb preparation out of ctrl_dumppolicy() We'll need this later for the per-op policy index dump. Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-03 14:18:29 -07:00
Johannes Berg	04a351a62b	netlink: rework policy dump to support multiple policies Rework the policy dump code a bit to support adding multiple policies to a single dump, in order to e.g. support per-op policies in generic netlink. v2: - move kernel-doc to implementation [Jakub] - squash the first patch to not flip-flop on the prototype [Jakub] - merge netlink_policy_dump_get_policy_idx() with the old get_policy_idx() we already had - rebase without Jakub's patch to have per-op dump Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-03 14:18:29 -07:00
Johannes Berg	899b07c578	netlink: compare policy more accurately The maxtype is really an integral part of the policy, and while we haven't gotten into a situation yet where this happens, it seems that some developer might eventually have two places pointing to identical policies, with different maxattr to exclude some attrs in one of the places. Even if not, it's really the right thing to compare both since the two data items fundamentally belong together. v2: - also do the proper comparison in get_policy_idx() Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-03 14:18:29 -07:00
Christoph Hellwig	89cd35c58b	iov_iter: transparently handle compat iovecs in import_iovec Use in compat_syscall to import either native or the compat iovecs, and remove the now superflous compat_import_iovec. This removes the need for special compat logic in most callers, and the remaining ones can still be simplified by using __import_iovec with a bool compat parameter. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-10-03 00:02:13 -04:00
Jakub Kicinski	a4bb4f5fc8	genetlink: switch control commands to per-op policies In preparation for adding a new attribute to CTRL_CMD_GETPOLICY split the policies for getpolicy and getfamily apart. This will cause a slight user-visible change in that dumping the policies will switch from per family to per op, but supposedly sniffer-type applications (which are the main use case for policy dumping thus far) should support both, anyway. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 19:11:12 -07:00
Jakub Kicinski	8e1ed28fd8	genetlink: use parsed attrs in dumppolicy Attributes are already parsed based on the policy specified in the family and ready-to-use in info->attrs. No need to call genlmsg_parse() again. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 19:11:12 -07:00
Jakub Kicinski	48526a0f4c	genetlink: bring back per op policy Add policy to the struct genl_ops structure, this time with maxattr, so it can be used properly. Propagate .policy and .maxattr from the family in genl_get_cmd() if needed, this way the rest of the code does not have to worry if the policy is per op or global. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 19:11:12 -07:00
Jakub Kicinski	78ade619c1	genetlink: use .start callback for dumppolicy The structure of ctrl_dumppolicy() is clearly split into init and dumping. Move the init to a .start callback for clarity, it's a more idiomatic netlink dump code structure. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 19:11:12 -07:00
Jakub Kicinski	adc848450f	genetlink: add a structure for dump state Whenever netlink dump uses more than 2 cb->args[] entries code gets hard to read. We're about to add more state to ctrl_dumppolicy() so create a structure. Since the structure is typed and clearly named we can remove the local fam_id variable and use ctx->fam_id directly. v3: - rebase onto explicit free fix v1: - s/nl_policy_dump/netlink_policy_dump_state/ - forward declare struct netlink_policy_dump_state, and move from passing unsigned long to actual pointer type - add build bug on - u16 fam_id - s/args/ctx/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 19:11:12 -07:00
Jakub Kicinski	66a9b9287d	genetlink: move to smaller ops wherever possible Bulk of the genetlink users can use smaller ops, move them. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 19:11:11 -07:00
Jakub Kicinski	0b588afdd1	genetlink: add small version of ops We want to add maxattr and policy back to genl_ops, to enable dumping per command policy to user space. This, however, would cause bloat for all the families with global policies. Introduce smaller version of ops (half the size of genl_ops). Translate these smaller ops into a full blown struct before use in the core. v1: - use struct assignment - put a full copy of the op in struct genl_dumpit_info - s/light/small/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 19:11:11 -07:00
Ioana Ciornei	c50bf2be73	devlink: add .trap_group_action_set() callback Add a new devlink callback, .trap_group_action_set(), which can be used by device drivers which do not support controlling the action (drop, trap) on each trap but rather on the entire group trap. If this new callback is populated, it will take precedence over the .trap_action_set() callback when the user requests a change of all the traps in a group. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 16:31:56 -07:00
Ioana Ciornei	10c24eb23d	devlink: add parser error drop packet traps Add parser error drop packet traps, so that capable device driver could register them with devlink. The new packet trap group holds any drops of packets which were marked by the device as erroneous during header parsing. Add documentation for every added packet trap and packet trap group. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 16:31:55 -07:00
Paolo Abeni	9d8c05ad56	tcp: fix syn cookied MPTCP request socket leak If a syn-cookies request socket don't pass MPTCP-level validation done in syn_recv_sock(), we need to release it immediately, or it will be leaked. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/89 Fixes: `9466a1cceb` ("mptcp: enable JOIN requests even if cookies are in use") Reported-and-tested-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 15:34:38 -07:00
David S. Miller	26d0a8edca	Another set of changes, this time with: * lots more S1G band support * 6 GHz scanning, finally * kernel-doc fixes * non-split wiphy dump fixes in nl80211 * various other small cleanups/features -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl92/IsACgkQB8qZga/f l8QrBxAAi8HJdFyZtCIuLrXL3KHzey5AjmrYuHdsFcdk8NJZkEco17I3l05D9ek6 76VvqjiYzDwdmgoHr3yz0K7pOAoTRpBKlaecvZLPXWf2bVhebWSU5EPcrTZHolrJ JBoBj4FU6Im/MnFbeiKxPj3M+NTQrLdekODSeaC5hFhi/oSF9lap6RMC8sz4YrVp 9yKzB8zjz+eL4wL3EsztEzpTxbvHTaVMe0XBVou7Fg2ZauJGwqMxpIukpMWUmmNr EequhVFpdlXbVMle8wP4ZR58c4+O1kbRoYL9WhAILtdDhCKfLccWXnlUjzuQlCeB RH/jzG7AlVhm972oUuqG9szAcU8hEgWdsNEML7pilXmFk/ZSNLpUZfZCAILn+Gd3 8oMQnXp2br+DLzf1SO7cxpL2KrTNjrb4gcJVBJ9eBlDjK/64N22MqZkpOKcMxq51 ocmf1MJ1TbAbZn/kY2hsoaPYt2+bm1umMa/t/Pwuds+xKZEOOuPNgZQILcAsfJZB 2OWDDT+RNLo/K4mPETtyQQZoCxAWB9n/CcnU+UTsmnUmMsEnCEbPnbYKBrc6jX1l jSP6XUD8fxhB2lfW+SPtQPnAi86+gblXVvEO8zm0+ez3juItlIsVcRY8ey/j9N1F uorpycvrfl6+Q1+mmhe6et2r+TLdGs73I0PJ44HRA9JZexKBrDg= =B9Xl -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-net-next-2020-10-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Another set of changes, this time with: * lots more S1G band support * 6 GHz scanning, finally * kernel-doc fixes * non-split wiphy dump fixes in nl80211 * various other small cleanups/features ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 15:33:13 -07:00
Coly Li	40efc4dc73	libceph: use sendpage_ok() in ceph_tcp_sendpage() In libceph, ceph_tcp_sendpage() does the following checks before handle the page by network layer's zero copy sendpage method, if (page_count(page) >= 1 && !PageSlab(page)) This check is exactly what sendpage_ok() does. This patch replace the open coded checks by sendpage_ok() as a code cleanup. Signed-off-by: Coly Li <colyli@suse.de> Acked-by: Jeff Layton <jlayton@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 15:27:08 -07:00
Coly Li	cf83a17ede	tcp: use sendpage_ok() to detect misused .sendpage commit `a10674bf24` ("tcp: detecting the misuse of .sendpage for Slab objects") adds the checks for Slab pages, but the pages don't have page_count are still missing from the check. Network layer's sendpage method is not designed to send page_count 0 pages neither, therefore both PageSlab() and page_count() should be both checked for the sending page. This is exactly what sendpage_ok() does. This patch uses sendpage_ok() in do_tcp_sendpages() to detect misused .sendpage, to make the code more robust. Fixes: `a10674bf24` ("tcp: detecting the misuse of .sendpage for Slab objects") Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Coly Li <colyli@suse.de> Cc: Vasily Averin <vvs@virtuozzo.com> Cc: David S. Miller <davem@davemloft.net> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 15:27:08 -07:00
Coly Li	7b62d31d3f	net: add WARN_ONCE in kernel_sendpage() for improper zero-copy send If a page sent into kernel_sendpage() is a slab page or it doesn't have ref_count, this page is improper to send by the zero copy sendpage() method. Otherwise such page might be unexpected released in network code path and causes impredictable panic due to kernel memory management data structure corruption. This path adds a WARN_ON() on the sending page before sends it into the concrete zero-copy sendpage() method, if the page is improper for the zero-copy sendpage() method, a warning message can be observed before the consequential unpredictable kernel panic. This patch does not change existing kernel_sendpage() behavior for the improper page zero-copy send, it just provides hint warning message for following potential panic due the kernel memory heap corruption. Signed-off-by: Coly Li <colyli@suse.de> Cc: Cong Wang <amwang@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: David S. Miller <davem@davemloft.net> Cc: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 15:27:08 -07:00
John Fastabend	18ebe16d10	bpf, sockmap: Add skb_adjust_room to pop bytes off ingress payload This implements a new helper skb_adjust_room() so users can push/pop extra bytes from a BPF_SK_SKB_STREAM_VERDICT program. Some protocols may include headers and other information that we may not want to include when doing a redirect from a BPF_SK_SKB_STREAM_VERDICT program. One use case is to redirect TLS packets into a receive socket that doesn't expect TLS data. In TLS case the first 13B or so contain the protocol header. With KTLS the payload is decrypted so we should be able to redirect this to a receiving socket, but the receiving socket may not be expecting to receive a TLS header and discard the data. Using the above helper we can pop the header off and put an appropriate header on the payload. This allows for creating a proxy between protocols without extra hops through the stack or userspace. So in order to fix this case add skb_adjust_room() so users can strip the header. After this the user can strip the header and an unmodified receiver thread will work correctly when data is redirected into the ingress path of a sock. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/160160099197.7052.8443193973242831692.stgit@john-Precision-5820-Tower	2020-10-02 15:18:39 -07:00
Florian Fainelli	3a68844dd2	net: dsa: Utilize __vlan_find_dev_deep_rcu() Now that we are guaranteed that dsa_untag_bridge_pvid() is called after eth_type_trans() we can utilize __vlan_find_dev_deep_rcu() which will take care of finding an 802.1Q upper on top of a bridge master. A common use case, prior to 12a1526d067 ("net: dsa: untag the bridge pvid from rx skbs") was to configure a bridge 802.1Q upper like this: ip link add name br0 type bridge vlan_filtering 0 ip link add link br0 name br0.1 type vlan id 1 in order to pop the default_pvid VLAN tag. With this change we restore that behavior while still allowing the DSA receive path to automatically pop the VLAN tag. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 13:36:07 -07:00
Florian Fainelli	a348292b63	net: dsa: Obtain VLAN protocol from skb->protocol Now that dsa_untag_bridge_pvid() is called after eth_type_trans() we are guaranteed that skb->protocol will be set to a correct value, thus allowing us to avoid calling vlan_eth_hdr(). Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 13:36:07 -07:00
Florian Fainelli	1c5ad5a940	net: dsa: b53: Set untag_bridge_pvid Indicate to the DSA receive path that we need to untage the bridge PVID, this allows us to remove the dsa_untag_bridge_pvid() calls from net/dsa/tag_brcm.c. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 13:36:07 -07:00
Florian Fainelli	1dc0408cdf	net: dsa: Call dsa_untag_bridge_pvid() from dsa_switch_rcv() When a DSA switch driver needs to call dsa_untag_bridge_pvid(), it can set dsa_switch::untag_brige_pvid to indicate this is necessary. This is a pre-requisite to making sure that we are always calling dsa_untag_bridge_pvid() after eth_type_trans() has been called. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 13:36:07 -07:00
David S. Miller	c16bcd70a1	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2020-10-02 1) Add a full xfrm compatible layer for 32-bit applications on 64-bit kernels. From Dmitry Safonov. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 13:16:15 -07:00
Johannes Berg	949ca6b82e	netlink: fix policy dump leak [ Upstream commit `a95bc734e6` ] If userspace doesn't complete the policy dump, we leak the allocated state. Fix this. Fixes: `d07dcf9aad` ("netlink: add infrastructure to expose policies to userspace") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 13:07:42 -07:00
Johannes Berg	a95bc734e6	netlink: fix policy dump leak If userspace doesn't complete the policy dump, we leak the allocated state. Fix this. Fixes: `d07dcf9aad` ("netlink: add infrastructure to expose policies to userspace") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 13:00:38 -07:00
Martin KaFai Lau	82f45c6c4a	bpf: tcp: Do not limit cb_flags when creating child sk from listen sk The commit `0813a84156` ("bpf: tcp: Allow bpf prog to write and parse TCP header option") unnecessarily introduced bpf_skops_init_child() which limited the child sk from inheriting all bpf_sock_ops_cb_flags of the listen sk. That breaks existing user expectation. This patch removes the bpf_skops_init_child() and just allows sock_copy() to do its job to copy everything from listen sk to the child sk. Fixes: `0813a84156` ("bpf: tcp: Allow bpf prog to write and parse TCP header option") Reported-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201002013448.2542025-1-kafai@fb.com	2020-10-02 11:34:48 -07:00
Thomas Pedersen	75f87eaeac	mac80211: avoid processing non-S1G elements on S1G band In ieee80211_determine_chantype(), the sband->ht_cap was being processed before S1G Operation element. Since the HT capability element should not be present on the S1G band, avoid processing potential garbage by moving the call to ieee80211_apply_htcap_overrides() to after the S1G block. Also, in case of a missing S1G Operation element, we would continue trying to process non-S1G elements (and return with a channel width of 20MHz). Instead, just assume primary channel is equal to operating and infer the operating width from the BSS channel, then return. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20201001174748.24520-1-thomas@adapt-ip.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-10-02 12:07:24 +02:00
Johannes Berg	ab10c22bc3	nl80211: fix non-split wiphy information When dumping wiphy information, we try to split the data into many submessages, but for old userspace we still support the old mode where this doesn't happen. However, in this case we were not resetting our state correctly and dumping multiple messages for each wiphy, which would have broken such older userspace. This was broken pretty much immediately afterwards because it only worked in the original commit where non-split dumps didn't have any more data than split dumps... Fixes: `fe1abafd94` ("nl80211: re-add channel width and extended capa advertising") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200928130717.3e6d9c6bada2.Ie0f151a8d0d00a8e1e18f6a8c9244dd02496af67@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-10-02 12:07:09 +02:00
Johannes Berg	f8d504caa9	nl80211: reduce non-split wiphy dump size When wiphy dumps cannot be split, such as in events or with older userspace that doesn't support it, the size can today be too big. Reduce it, by doing two things: 1) remove data that couldn't have been present before the split capability was introduced since it's new, such as HE capabilities 2) as suggested by Martin Willi, remove management frame subtypes from the split dumps, as just (1) isn't even enough due to other new code capabilities. This is fine as old consumers (really just wpa_supplicant) didn't check this data before they got support for split dumps. Reported-by: Martin Willi <martin@strongswan.org> Suggested-by: Martin Willi <martin@strongswan.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Tested-by: Martin Willi <martin@strongswan.org> Link: https://lore.kernel.org/r/20200928130655.53bce7873164.I71f06c9a221cd0630429a1a56eeae68a13beca61@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-10-02 12:06:49 +02:00
Ye Bin	000fe2685b	net-sysfs: Fix inconsistent of format with argument type in net-sysfs.c Fix follow warnings: [net/core/net-sysfs.c:1161]: (warning) %u in format string (no. 1) requires 'unsigned int' but the argument type is 'int'. [net/core/net-sysfs.c:1162]: (warning) %u in format string (no. 1) requires 'unsigned int' but the argument type is 'int'. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-01 18:45:23 -07:00
Ye Bin	32be425b45	pktgen: Fix inconsistent of format with argument type in pktgen.c Fix follow warnings: [net/core/pktgen.c:925]: (warning) %u in format string (no. 1) requires 'unsigned int' but the argument type is 'signed int'. [net/core/pktgen.c:942]: (warning) %u in format string (no. 1) requires 'unsigned int' but the argument type is 'signed int'. [net/core/pktgen.c:962]: (warning) %u in format string (no. 1) requires 'unsigned int' but the argument type is 'signed int'. [net/core/pktgen.c:984]: (warning) %u in format string (no. 1) requires 'unsigned int' but the argument type is 'signed int'. [net/core/pktgen.c:1149]: (warning) %d in format string (no. 1) requires 'int' but the argument type is 'unsigned int'. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-01 18:45:23 -07:00
David S. Miller	23a1f682a9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2020-10-01 The following pull-request contains BPF updates for your net-next tree. We've added 90 non-merge commits during the last 8 day(s) which contain a total of 103 files changed, 7662 insertions(+), 1894 deletions(-). Note that once bpf(/net) tree gets merged into net-next, there will be a small merge conflict in tools/lib/bpf/btf.c between commit `1245008122` ("libbpf: Fix native endian assumption when parsing BTF") from the bpf tree and the commit `3289959b97` ("libbpf: Support BTF loading and raw data output in both endianness") from the bpf-next tree. Correct resolution would be to stick with bpf-next, it should look like: [...] /* check BTF magic / if (fread(&magic, 1, sizeof(magic), f) < sizeof(magic)) { err = -EIO; goto err_out; } if (magic != BTF_MAGIC && magic != bswap_16(BTF_MAGIC)) { / definitely not a raw BTF / err = -EPROTO; goto err_out; } / get file size */ [...] The main changes are: 1) Add bpf_snprintf_btf() and bpf_seq_printf_btf() helpers to support displaying BTF-based kernel data structures out of BPF programs, from Alan Maguire. 2) Speed up RCU tasks trace grace periods by a factor of 50 & fix a few race conditions exposed by it. It was discussed to take these via BPF and networking tree to get better testing exposure, from Paul E. McKenney. 3) Support multi-attach for freplace programs, needed for incremental attachment of multiple XDP progs using libxdp dispatcher model, from Toke Høiland-Jørgensen. 4) libbpf support for appending new BTF types at the end of BTF object, allowing intrusive changes of prog's BTF (useful for future linking), from Andrii Nakryiko. 5) Several BPF helper improvements e.g. avoid atomic op in cookie generator and add a redirect helper into neighboring subsys, from Daniel Borkmann. 6) Allow map updates on sockmaps from bpf_iter context in order to migrate sockmaps from one to another, from Lorenz Bauer. 7) Fix 32 bit to 64 bit assignment from latest alu32 bounds tracking which caused a verifier issue due to type downgrade to scalar, from John Fastabend. 8) Follow-up on tail-call support in BPF subprogs which optimizes x64 JIT prologue and epilogue sections, from Maciej Fijalkowski. 9) Add an option to perf RB map to improve sharing of event entries by avoiding remove- on-close behavior. Also, add BPF_PROG_TEST_RUN for raw_tracepoint, from Song Liu. 10) Fix a crash in AF_XDP's socket_release when memory allocation for UMEMs fails, from Magnus Karlsson. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-01 14:29:01 -07:00
Yonghong Song	d82a532a61	bpf: Fix "unresolved symbol" build error with resolve_btfids Michal reported a build failure likes below: BTFIDS vmlinux FAILED unresolved symbol tcp_timewait_sock make[1]: *** [/.../linux-5.9-rc7/Makefile:1176: vmlinux] Error 255 This error can be triggered when config has CONFIG_NET enabled but CONFIG_INET disabled. In this case, there is no user of istructs inet_timewait_sock and tcp_timewait_sock and hence vmlinux BTF types are not generated for these two structures. To fix the problem, let us force BTF generation for these two structures with BTF_TYPE_EMIT. Fixes: `fce557bcef` ("bpf: Make btf_sock_ids global") Reported-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20201001051339.2549085-1-yhs@fb.com	2020-10-01 18:38:50 +02:00
Jens Axboe	a3ec600540	io_uring: move io_uring_get_socket() into io_uring.h Now we have a io_uring kernel header, move this definition out of fs.h and into io_uring.h where it belongs. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-30 20:32:33 -06:00
Ido Schimmel	93e155967c	drop_monitor: Filter control packets in drop monitor Previously, devlink called into drop monitor in order to report hardware originated drops / exceptions. devlink intentionally filtered control packets and did not pass them to drop monitor as they were not dropped by the underlying hardware. Now drop monitor registers its probe on a generic 'devlink_trap_report' tracepoint and should therefore perform this filtering itself instead of having devlink do that. Add the trap type as metadata and have drop monitor ignore control packets. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-30 18:01:26 -07:00
Ido Schimmel	a848c05f4b	drop_monitor: Remove duplicate struct 'struct net_dm_hw_metadata' is a duplicate of 'struct devlink_trap_metadata'. Remove the former and simplify the code. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-30 18:01:26 -07:00
Ido Schimmel	de9cbb81bd	drop_monitor: Remove no longer used functions The old probe functions that were invoked by drop monitor code are no longer called and can thus be removed. They were replaced by actual probe functions that are registered on the recently introduced 'devlink_trap_report' tracepoint. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-30 18:01:26 -07:00
Ido Schimmel	8ee2267ad3	drop_monitor: Convert to using devlink tracepoint Convert drop monitor to use the recently introduced 'devlink_trap_report' tracepoint instead of having devlink call into drop monitor. This is both consistent with software originated drops ('kfree_skb' tracepoint) and also allows drop monitor to be built as a module and still report hardware originated drops. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-30 18:01:26 -07:00
Ido Schimmel	5855357cd4	drop_monitor: Prepare probe functions for devlink tracepoint Drop monitor supports two alerting modes: Summary and packet. Prepare a probe function for each, so that they could be later registered on the devlink tracepoint by calling register_trace_devlink_trap_report(), based on the configured alerting mode. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-30 18:01:26 -07:00
Ido Schimmel	5b88823bfe	devlink: Add a tracepoint for trap reports Add a tracepoint for trap reports so that drop monitor could register its probe on it. Use trace_devlink_trap_report_enabled() to avoid wasting cycles setting the trap metadata if the tracepoint is not enabled. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-30 18:01:26 -07:00
Eric Dumazet	a37c2134be	tcp: add exponential backoff in __tcp_send_ack() Whenever host is under very high memory pressure, __tcp_send_ack() skb allocation fails, and we setup a 200 ms (TCP_DELACK_MAX) timer before retrying. On hosts with high number of TCP sockets, we can spend considerable amount of cpu cycles in these attempts, add high pressure on various spinlocks in mm-layer, ultimately blocking threads attempting to free space from making any progress. This patch adds standard exponential backoff to avoid adding fuel to the fire. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-30 14:21:30 -07:00
Eric Dumazet	b6b6d6533a	inet: remove icsk_ack.blocked TCP has been using it to work around the possibility of tcp_delack_timer() finding the socket owned by user. After commit `6f458dfb40` ("tcp: improve latencies of timer triggered events") we added TCP_DELACK_TIMER_DEFERRED atomic bit for more immediate recovery, so we can get rid of icsk_ack.blocked This frees space that following patch will reuse. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-30 14:21:30 -07:00
Daniel Borkmann	b4ab314149	bpf: Add redirect_neigh helper as redirect drop-in Add a redirect_neigh() helper as redirect() drop-in replacement for the xmit side. Main idea for the helper is to be very similar in semantics to the latter just that the skb gets injected into the neighboring subsystem in order to let the stack do the work it knows best anyway to populate the L2 addresses of the packet and then hand over to dev_queue_xmit() as redirect() does. This solves two bigger items: i) skbs don't need to go up to the stack on the host facing veth ingress side for traffic egressing the container to achieve the same for populating L2 which also has the huge advantage that ii) the skb->sk won't get orphaned in ip_rcv_core() when entering the IP routing layer on the host stack. Given that skb->sk neither gets orphaned when crossing the netns as per `9c4c325252` ("skbuff: preserve sock reference when scrubbing the skb.") the helper can then push the skbs directly to the phys device where FQ scheduler can do its work and TCP stack gets proper backpressure given we hold on to skb->sk as long as skb is still residing in queues. With the helper used in BPF data path to then push the skb to the phys device, I observed a stable/consistent TCP_STREAM improvement on veth devices for traffic going container -> host -> host -> container from ~10Gbps to ~15Gbps for a single stream in my test environment. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: David Ahern <dsahern@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Cc: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/bpf/f207de81629e1724899b73b8112e0013be782d35.1601477936.git.daniel@iogearbox.net	2020-09-30 11:50:35 -07:00
Daniel Borkmann	92acdc58ab	bpf, net: Rework cookie generator as per-cpu one With its use in BPF, the cookie generator can be called very frequently in particular when used out of cgroup v2 hooks (e.g. connect / sendmsg) and attached to the root cgroup, for example, when used in v1/v2 mixed environments. In particular, when there's a high churn on sockets in the system there can be many parallel requests to the bpf_get_socket_cookie() and bpf_get_netns_cookie() helpers which then cause contention on the atomic counter. As similarly done in `f991bd2e14` ("fs: introduce a per-cpu last_ino allocator"), add a small helper library that both can use for the 64 bit counters. Given this can be called from different contexts, we also need to deal with potential nested calls even though in practice they are considered extremely rare. One idea as suggested by Eric Dumazet was to use a reverse counter for this situation since we don't expect 64 bit overflows anyways; that way, we can avoid bigger gaps in the 64 bit counter space compared to just batch-wise increase. Even on machines with small number of cores (e.g. 4) the cookie generation shrinks from min/max/med/avg (ns) of 22/50/40/38.9 down to 10/35/14/17.3 when run in parallel from multiple CPUs. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Link: https://lore.kernel.org/bpf/8a80b8d27d3c49f9a14e1d5213c19d8be87d1dc8.1601477936.git.daniel@iogearbox.net	2020-09-30 11:50:35 -07:00
Daniel Borkmann	b426ce83ba	bpf: Add classid helper only based on skb->sk Similarly to `5a52ae4e32` ("bpf: Allow to retrieve cgroup v1 classid from v2 hooks"), add a helper to retrieve cgroup v1 classid solely based on the skb->sk, so it can be used as key as part of BPF map lookups out of tc from host ns, in particular given the skb->sk is retained these days when crossing net ns thanks to `9c4c325252` ("skbuff: preserve sock reference when scrubbing the skb."). This is similar to bpf_skb_cgroup_id() which implements the same for v2. Kubernetes ecosystem is still operating on v1 however, hence net_cls needs to be used there until this can be dropped in with the v2 helper of bpf_skb_cgroup_id(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/ed633cf27a1c620e901c5aa99ebdefb028dce600.1601477936.git.daniel@iogearbox.net	2020-09-30 11:50:34 -07:00
Song Liu	963ec27a10	bpf: fix raw_tp test run in preempt kernel In preempt kernel, BPF_PROG_TEST_RUN on raw_tp triggers: [ 35.874974] BUG: using smp_processor_id() in preemptible [00000000] code: new_name/87 [ 35.893983] caller is bpf_prog_test_run_raw_tp+0xd4/0x1b0 [ 35.900124] CPU: 1 PID: 87 Comm: new_name Not tainted 5.9.0-rc6-g615bd02bf #1 [ 35.907358] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 35.916941] Call Trace: [ 35.919660] dump_stack+0x77/0x9b [ 35.923273] check_preemption_disabled+0xb4/0xc0 [ 35.928376] bpf_prog_test_run_raw_tp+0xd4/0x1b0 [ 35.933872] ? selinux_bpf+0xd/0x70 [ 35.937532] __do_sys_bpf+0x6bb/0x21e0 [ 35.941570] ? find_held_lock+0x2d/0x90 [ 35.945687] ? vfs_write+0x150/0x220 [ 35.949586] do_syscall_64+0x2d/0x40 [ 35.953443] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix this by calling migrate_disable() before smp_processor_id(). Fixes: `1b4d60ec16` ("bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-09-30 08:34:08 -07:00
Jose M. Guisado Gomez	002f217653	netfilter: nf_tables: add userdata attributes to nft_chain Enables storing userdata for nft_chain. Field udata points to user data and udlen stores its length. Adds new attribute flag NFTA_CHAIN_USERDATA. Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-09-30 11:46:50 +02:00
Jose M. Guisado Gomez	85db827a57	netfilter: nf_tables: use nla_memdup to copy udata When userdata support was added to tables and objects, user data coming from user space was allocated and copied using kzalloc + nla_memcpy. Use nla_memdup to copy userdata of tables and objects. Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-09-30 11:45:14 +02:00
Jose M. Guisado Gomez	bc7a708235	netfilter: nf_tables: fix userdata memleak When userdata was introduced for tables and objects its allocation was only freed inside the error path of the new{table, object} functions. Free user data inside corresponding destroy functions for tables and objects. Fixes: `b131c96496` ("netfilter: nf_tables: add userdata support for nft_object") Fixes: `7a81575b80` ("netfilter: nf_tables: add userdata attributes to nft_table") Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-09-30 11:40:22 +02:00
David S. Miller	1f25c9bbfd	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Alexei Starovoitov says: ==================== pull-request: bpf 2020-09-29 The following pull-request contains BPF updates for your net tree. We've added 7 non-merge commits during the last 14 day(s) which contain a total of 7 files changed, 28 insertions(+), 8 deletions(-). The main changes are: 1) fix xdp loading regression in libbpf for old kernels, from Andrii. 2) Do not discard packet when NETDEV_TX_BUSY, from Magnus. 3) Fix corner cases in libbpf related to endianness and kconfig, from Tony. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-30 01:49:20 -07:00
Mat Martineau	1a49b2c2a5	mptcp: Handle incoming 32-bit DATA_FIN values The peer may send a DATA_FIN mapping with either a 32-bit or 64-bit sequence number. When a 32-bit sequence number is received for the DATA_FIN, it must be expanded to 64 bits before comparing it to the last acked sequence number. This expansion was missing. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/93 Fixes: `3721b9b646` ("mptcp: Track received DATA_FIN sequence number and add related helpers") Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-29 18:15:46 -07:00
Mat Martineau	917944da3b	mptcp: Consistently use READ_ONCE/WRITE_ONCE with msk->ack_seq The msk->ack_seq value is sometimes read without the msk lock held, so make proper use of READ_ONCE and WRITE_ONCE. Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-29 18:15:40 -07:00
Sebastian Andrzej Siewior	c11171a413	net: Add netif_rx_any_context() Quite some drivers make conditional decisions based on in_interrupt() to invoke either netif_rx() or netif_rx_ni(). Conditionals based on in_interrupt() or other variants of preempt count checks in drivers should not exist for various reasons and Linus clearly requested to either split the code pathes or pass an argument to the common functions which provides the context. This is obviously the correct solution, but for some of the affected drivers this needs a major rewrite due to their convoluted structure. As in_interrupt() usage in drivers needs to be phased out, provide netif_rx_any_context() as a stop gap for these drivers. This confines the in_interrupt() conditional to core code which in turn allows to remove the access to this check for driver code and provides one central place to do further modifications once the driver maze is cleaned up. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-29 14:02:53 -07:00
Tom Parkin	3f47cb4c1c	l2tp: report rx cookie discards in netlink get When an L2TPv3 session receives a data frame with an incorrect cookie l2tp_core logs a warning message and bumps a stats counter to reflect the fact that the packet has been dropped. However, the stats counter in question is missing from the l2tp_netlink get message for tunnel and session instances. Include the statistic in the netlink get response. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-29 13:26:36 -07:00
David S. Miller	2bd056f550	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2020-09-29 Here's the main bluetooth-next pull request for 5.10: - Multiple fixes to suspend/resume handling - Added mgmt events for controller suspend/resume state - Improved extended advertising support - btintel: Enhanced support for next generation controllers - Added Qualcomm Bluetooth SoC WCN6855 support - Several other smaller fixes & improvements ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-29 13:22:53 -07:00
Ciara Loftus	f1fc8ece6c	xsk: Fix a documentation mistake in xsk_queue.h After 'peeking' the ring, the consumer, not the producer, reads the data. Fix this mistake in the comments. Fixes: `15d8c9162c` ("xsk: Add function naming comments and reorder functions") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20200928082344.17110-1-ciara.loftus@intel.com	2020-09-29 11:25:56 -07:00
Jakub Kicinski	78b70155dc	ethtool: mark netlink family as __ro_after_init Like all genl families ethtool_genl_family needs to not be a straight up constant, because it's modified/initialized by genl_register_family(). After init, however, it's only passed to genlmsg_put() & co. therefore we can mark it as __ro_after_init. Since genl_family structure contains function pointers mark this as a fix. Fixes: `2b4a8990b7` ("ethtool: introduce ethtool netlink interface") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 18:52:50 -07:00
Gustavo A. R. Silva	d61491a51f	net/sched: cls_u32: Replace one-element array with flexible-array member There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. Refactor the code according to the use of a flexible-array member in struct tc_u_hnode and use the struct_size() helper to calculate the size for the allocations. Commit `5778d39d07` ("net_sched: fix struct tc_u_hnode layout in u32") makes it clear that the code is expected to dynamically allocate divisor + 1 entries for ->ht[] in tc_uhnode. Also, based on other observations, as the piece of code below: 1232 for (h = 0; h <= ht->divisor; h++) { 1233 for (n = rtnl_dereference(ht->ht[h]); 1234 n; 1235 n = rtnl_dereference(n->next)) { 1236 if (tc_skip_hw(n->flags)) 1237 continue; 1238 1239 err = u32_reoffload_knode(tp, n, add, cb, 1240 cb_priv, extack); 1241 if (err) 1242 return err; 1243 } 1244 } we can assume that, in general, the code is actually expecting to allocate that extra space for the one-element array in tc_uhnode, everytime it allocates memory for instances of tc_uhnode or tc_u_common structures. That's the reason for passing '1' as the last argument for struct_size() in the allocation for _root_ht_ and _tp_c_, and 'divisor + 1' in the allocation code for _ht_. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.9-rc1/process/deprecated.html#zero-length-and-one-element-arrays Tested-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/5f7062af.z3T9tn9yIPv6h5Ny%25lkp@intel.com/ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 18:48:42 -07:00
Lorenz Bauer	6550f2dddf	bpf: sockmap: Enable map_update_elem from bpf_iter Allow passing a pointer to a BTF struct sock_common* when updating a sockmap or sockhash. Since BTF pointers can fault and therefore be NULL at runtime we need to add an additional !sk check to sock_map_update_elem. Since we may be passed a request or timewait socket we also need to check sk_fullsock. Doing this allows calling map_update_elem on sockmap from bpf_iter context, which uses BTF pointers. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200928090805.23343-2-lmb@cloudflare.com	2020-09-28 16:40:46 -07:00
Davide Caratti	e5f7e211b6	ip6gre: avoid tx_error when sending MLD/DAD on external tunnels similarly to what has been done with commit `9d149045b3` ("geneve: change from tx_error to tx_dropped on missing metadata"), avoid reporting errors to userspace in case the kernel doesn't find any tunnel information for a skb that is going to be transmitted: an increase of tx_dropped is enough. tested with the following script: # for t in ip6gre ip6gretap ip6erspan; do > ip link add dev gre6-test0 type $t external > ip address add dev gre6-test0 2001:db8::1/64 > ip link set dev gre6-test0 up > sleep 30 > ip -s -j link show dev gre6-test0 \| jq \ > '.[0].stats64.tx \| {"errors": .errors, "dropped": .dropped}' > ip link del dev gre6-test0 > done Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 16:01:37 -07:00
Manivannan Sadhasivam	a7809ff90c	net: qrtr: ns: Protect radix_tree_deref_slot() using rcu read locks The rcu read locks are needed to avoid potential race condition while dereferencing radix tree from multiple threads. The issue was identified by syzbot. Below is the crash report: ============================= WARNING: suspicious RCU usage 5.7.0-syzkaller #0 Not tainted ----------------------------- include/linux/radix-tree.h:176 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by kworker/u4:1/21: #0: ffff88821b097938 ((wq_completion)qrtr_ns_handler){+.+.}-{0:0}, at: spin_unlock_irq include/linux/spinlock.h:403 [inline] #0: ffff88821b097938 ((wq_completion)qrtr_ns_handler){+.+.}-{0:0}, at: process_one_work+0x6df/0xfd0 kernel/workqueue.c:2241 #1: ffffc90000dd7d80 ((work_completion)(&qrtr_ns.work)){+.+.}-{0:0}, at: process_one_work+0x71e/0xfd0 kernel/workqueue.c:2243 stack backtrace: CPU: 0 PID: 21 Comm: kworker/u4:1 Not tainted 5.7.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: qrtr_ns_handler qrtr_ns_worker Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1e9/0x30e lib/dump_stack.c:118 radix_tree_deref_slot include/linux/radix-tree.h:176 [inline] ctrl_cmd_new_lookup net/qrtr/ns.c:558 [inline] qrtr_ns_worker+0x2aff/0x4500 net/qrtr/ns.c:674 process_one_work+0x76e/0xfd0 kernel/workqueue.c:2268 worker_thread+0xa7f/0x1450 kernel/workqueue.c:2414 kthread+0x353/0x380 kernel/kthread.c:268 Fixes: `0c2204a4ad` ("net: qrtr: Migrate nameservice to kernel from userspace") Reported-and-tested-by: syzbot+0f84f6eed90503da72fc@syzkaller.appspotmail.com Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 15:54:57 -07:00
Ursula Braun	e8d726c8e8	net/smc: CLC decline - V2 enhancements This patch covers the small SMCD version 2 changes for CLC decline. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 15:19:03 -07:00
Ursula Braun	b81a5eb789	net/smc: introduce CLC first contact extension SMC Version 2 defines a first contact extension for CLC accept and CLC confirm. This patch covers sending and receiving of the CLC first contact extension. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 15:19:03 -07:00
Ursula Braun	a7c9c5f4af	net/smc: CLC accept / confirm V2 The new format of SMCD V2 CLC accept and confirm is introduced, and building and checking of SMCD V2 CLC accepts / confirms is adapted accordingly. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 15:19:03 -07:00
Ursula Braun	5c21c4ccaf	net/smc: determine accepted ISM devices SMCD Version 2 allows to propose up to 8 additional ISM devices offered to the peer as candidates for SMCD communication. This patch covers the server side, i.e. selection of an ISM device matching one of the proposed ISM devices, that will be used for CLC accept Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 15:19:03 -07:00
Ursula Braun	8c3dca341a	net/smc: build and send V2 CLC proposal The new format of an SMCD V2 CLC proposal is introduced, and building and checking of SMCD V2 CLC proposals is adapted accordingly. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 15:19:03 -07:00
Ursula Braun	d70bf4f7a9	net/smc: determine proposed ISM devices SMCD Version 2 allows to propose up to 8 additional ISM devices offered to the peer as candidates for SMCD communication. This patch covers determination of the ISM devices to be proposed. ISM devices without PNETID are preferred, since ISM devices with PNETID are a V1 leftover and will disappear over the time. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 15:19:03 -07:00
Ursula Braun	e888a2e833	net/smc: introduce list of pnetids for Ethernet devices SMCD version 2 allows usage of ISM devices with hardware PNETID only, if an Ethernet net_device exists with the same hardware PNETID. This requires to maintain a list of pnetids belonging to Ethernet net_devices, which is covered by this patch. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 15:19:03 -07:00
Ursula Braun	8caaccf521	net/smc: introduce CHID callback for ISM devices With SMCD version 2 the CHIDs of ISM devices are needed for the CLC handshake. This patch provides the new callback to retrieve the CHID of an ISM device. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 15:19:03 -07:00
Ursula Braun	201091ebb2	net/smc: introduce System Enterprise ID (SEID) SMCD version 2 defines a System Enterprise ID (short SEID). This patch contains the SEID creation and adds the callback to retrieve the created SEID. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 15:19:02 -07:00
Ursula Braun	3fc6493761	net/smc: prepare for more proposed ISM devices SMCD Version 2 allows proposing of up to 8 ISM devices in addition to the native ISM device of SMCD Version 1. This patch prepares the struct smc_init_info to deal with these additional 8 ISM devices. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 15:19:02 -07:00
Ursula Braun	e15c6c46de	net/smc: split CLC confirm/accept data to be sent When sending CLC confirm and CLC accept, separate the trailing part of the message from the initial part (to be prepared for future first contact extension). Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 15:19:02 -07:00
Ursula Braun	7affc80982	net/smc: separate find device functions This patch provides better separation of device determinations in function smc_listen_work(). No functional change. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 15:19:02 -07:00
Ursula Braun	f1eb02f952	net/smc: CLC header fields renaming SMCD version 2 defines 2 more bits in the CLC header to specify version 2 types. This patch prepares better naming of the CLC header fields. No functional change. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 15:19:02 -07:00
Karsten Graul	a304e29a24	net/smc: remove constant and introduce helper to check for a pnet id Use the existing symbol _S instead of SMC_ASCII_BLANK, and introduce a helper to check if a pnetid is set. No functional change. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 15:19:02 -07:00
Taehee Yoo	1fc70edb7d	net: core: add nested_level variable in net_device This patch is to add a new variable 'nested_level' into the net_device structure. This variable will be used as a parameter of spin_lock_nested() of dev->addr_list_lock. netif_addr_lock() can be called recursively so spin_lock_nested() is used instead of spin_lock() and dev->lower_level is used as a parameter of spin_lock_nested(). But, dev->lower_level value can be updated while it is being used. So, lockdep would warn a possible deadlock scenario. When a stacked interface is deleted, netif_{uc \| mc}_sync() is called recursively. So, spin_lock_nested() is called recursively too. At this moment, the dev->lower_level variable is used as a parameter of it. dev->lower_level value is updated when interfaces are being unlinked/linked immediately. Thus, After unlinking, dev->lower_level shouldn't be a parameter of spin_lock_nested(). A (macvlan) \| B (vlan) \| C (bridge) \| D (macvlan) \| E (vlan) \| F (bridge) A->lower_level : 6 B->lower_level : 5 C->lower_level : 4 D->lower_level : 3 E->lower_level : 2 F->lower_level : 1 When an interface 'A' is removed, it releases resources. At this moment, netif_addr_lock() would be called. Then, netdev_upper_dev_unlink() is called recursively. Then dev->lower_level is updated. There is no problem. But, when the bridge module is removed, 'C' and 'F' interfaces are removed at once. If 'F' is removed first, a lower_level value is like below. A->lower_level : 5 B->lower_level : 4 C->lower_level : 3 D->lower_level : 2 E->lower_level : 1 F->lower_level : 1 Then, 'C' is removed. at this moment, netif_addr_lock() is called recursively. The ordering is like this. C(3)->D(2)->E(1)->F(1) At this moment, the lower_level value of 'E' and 'F' are the same. So, lockdep warns a possible deadlock scenario. In order to avoid this problem, a new variable 'nested_level' is added. This value is the same as dev->lower_level - 1. But this value is updated in rtnl_unlock(). So, this variable can be used as a parameter of spin_lock_nested() safely in the rtnl context. Test commands: ip link add br0 type bridge vlan_filtering 1 ip link add vlan1 link br0 type vlan id 10 ip link add macvlan2 link vlan1 type macvlan ip link add br3 type bridge vlan_filtering 1 ip link set macvlan2 master br3 ip link add vlan4 link br3 type vlan id 10 ip link add macvlan5 link vlan4 type macvlan ip link add br6 type bridge vlan_filtering 1 ip link set macvlan5 master br6 ip link add vlan7 link br6 type vlan id 10 ip link add macvlan8 link vlan7 type macvlan ip link set br0 up ip link set vlan1 up ip link set macvlan2 up ip link set br3 up ip link set vlan4 up ip link set macvlan5 up ip link set br6 up ip link set vlan7 up ip link set macvlan8 up modprobe -rv bridge Splat looks like: [ 36.057436][ T744] WARNING: possible recursive locking detected [ 36.058848][ T744] 5.9.0-rc6+ #728 Not tainted [ 36.059959][ T744] -------------------------------------------- [ 36.061391][ T744] ip/744 is trying to acquire lock: [ 36.062590][ T744] ffff8c4767509280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_set_rx_mode+0x19/0x30 [ 36.064922][ T744] [ 36.064922][ T744] but task is already holding lock: [ 36.066626][ T744] ffff8c4767769280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_uc_add+0x1e/0x60 [ 36.068851][ T744] [ 36.068851][ T744] other info that might help us debug this: [ 36.070731][ T744] Possible unsafe locking scenario: [ 36.070731][ T744] [ 36.072497][ T744] CPU0 [ 36.073238][ T744] ---- [ 36.074007][ T744] lock(&vlan_netdev_addr_lock_key); [ 36.075290][ T744] lock(&vlan_netdev_addr_lock_key); [ 36.076590][ T744] [ 36.076590][ T744] * DEADLOCK * [ 36.076590][ T744] [ 36.078515][ T744] May be due to missing lock nesting notation [ 36.078515][ T744] [ 36.080491][ T744] 3 locks held by ip/744: [ 36.081471][ T744] #0: ffffffff98571df0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x236/0x490 [ 36.083614][ T744] #1: ffff8c4767769280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_uc_add+0x1e/0x60 [ 36.085942][ T744] #2: ffff8c476c8da280 (&bridge_netdev_addr_lock_key/4){+...}-{2:2}, at: dev_uc_sync+0x39/0x80 [ 36.088400][ T744] [ 36.088400][ T744] stack backtrace: [ 36.089772][ T744] CPU: 6 PID: 744 Comm: ip Not tainted 5.9.0-rc6+ #728 [ 36.091364][ T744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 36.093630][ T744] Call Trace: [ 36.094416][ T744] dump_stack+0x77/0x9b [ 36.095385][ T744] __lock_acquire+0xbc3/0x1f40 [ 36.096522][ T744] lock_acquire+0xb4/0x3b0 [ 36.097540][ T744] ? dev_set_rx_mode+0x19/0x30 [ 36.098657][ T744] ? rtmsg_ifinfo+0x1f/0x30 [ 36.099711][ T744] ? __dev_notify_flags+0xa5/0xf0 [ 36.100874][ T744] ? rtnl_is_locked+0x11/0x20 [ 36.101967][ T744] ? __dev_set_promiscuity+0x7b/0x1a0 [ 36.103230][ T744] _raw_spin_lock_bh+0x38/0x70 [ 36.104348][ T744] ? dev_set_rx_mode+0x19/0x30 [ 36.105461][ T744] dev_set_rx_mode+0x19/0x30 [ 36.106532][ T744] dev_set_promiscuity+0x36/0x50 [ 36.107692][ T744] __dev_set_promiscuity+0x123/0x1a0 [ 36.108929][ T744] dev_set_promiscuity+0x1e/0x50 [ 36.110093][ T744] br_port_set_promisc+0x1f/0x40 [bridge] [ 36.111415][ T744] br_manage_promisc+0x8b/0xe0 [bridge] [ 36.112728][ T744] __dev_set_promiscuity+0x123/0x1a0 [ 36.113967][ T744] ? __hw_addr_sync_one+0x23/0x50 [ 36.115135][ T744] __dev_set_rx_mode+0x68/0x90 [ 36.116249][ T744] dev_uc_sync+0x70/0x80 [ 36.117244][ T744] dev_uc_add+0x50/0x60 [ 36.118223][ T744] macvlan_open+0x18e/0x1f0 [macvlan] [ 36.119470][ T744] __dev_open+0xd6/0x170 [ 36.120470][ T744] __dev_change_flags+0x181/0x1d0 [ 36.121644][ T744] dev_change_flags+0x23/0x60 [ 36.122741][ T744] do_setlink+0x30a/0x11e0 [ 36.123778][ T744] ? __lock_acquire+0x92c/0x1f40 [ 36.124929][ T744] ? __nla_validate_parse.part.6+0x45/0x8e0 [ 36.126309][ T744] ? __lock_acquire+0x92c/0x1f40 [ 36.127457][ T744] __rtnl_newlink+0x546/0x8e0 [ 36.128560][ T744] ? lock_acquire+0xb4/0x3b0 [ 36.129623][ T744] ? deactivate_slab.isra.85+0x6a1/0x850 [ 36.130946][ T744] ? __lock_acquire+0x92c/0x1f40 [ 36.132102][ T744] ? lock_acquire+0xb4/0x3b0 [ 36.133176][ T744] ? is_bpf_text_address+0x5/0xe0 [ 36.134364][ T744] ? rtnl_newlink+0x2e/0x70 [ 36.135445][ T744] ? rcu_read_lock_sched_held+0x32/0x60 [ 36.136771][ T744] ? kmem_cache_alloc_trace+0x2d8/0x380 [ 36.138070][ T744] ? rtnl_newlink+0x2e/0x70 [ 36.139164][ T744] rtnl_newlink+0x47/0x70 [ ... ] Fixes: `845e0ebb44` ("net: change addr_list_lock back to static key") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 15:00:15 -07:00
Taehee Yoo	eff7423365	net: core: introduce struct netdev_nested_priv for nested interface infrastructure Functions related to nested interface infrastructure such as netdev_walk_all_{ upper \| lower }_dev() pass both private functions and "data" pointer to handle their own things. At this point, the data pointer type is void *. In order to make it easier to expand common variables and functions, this new netdev_nested_priv structure is added. In the following patch, a new member variable will be added into this struct to fix the lockdep issue. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 15:00:15 -07:00
Taehee Yoo	fe8300fd8d	net: core: add __netdev_upper_dev_unlink() The netdev_upper_dev_unlink() has to work differently according to flags. This idea is the same with __netdev_upper_dev_link(). In the following patches, new flags will be added. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 15:00:14 -07:00
Cong Wang	1aad804990	net_sched: remove a redundant goto chain check All TC actions call tcf_action_check_ctrlact() to validate goto chain, so this check in tcf_action_init_1() is actually redundant. Remove it to save troubles of leaking memory. Fixes: `e49d8c22f1` ("net_sched: defer tcf_idr_insert() in tcf_action_init_1()") Reported-by: Vlad Buslov <vladbu@mellanox.com> Suggested-by: Davide Caratti <dcaratti@redhat.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 12:58:45 -07:00
Song Liu	1b4d60ec16	bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint Add .test_run for raw_tracepoint. Also, introduce a new feature that runs the target program on a specific CPU. This is achieved by a new flag in bpf_attr.test, BPF_F_TEST_RUN_ON_CPU. When this flag is set, the program is triggered on cpu with id bpf_attr.test.cpu. This feature is needed for BPF programs that handle perf_event and other percpu resources, as the program can access these resource locally. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200925205432.1777-2-songliubraving@fb.com	2020-09-28 21:52:36 +02:00
Jakub Kicinski	74cc6d182d	udp_tunnel: add the ability to share port tables Unfortunately recent Intel NIC designs share the UDP port table across netdevs. So far the UDP tunnel port state was maintained per netdev, we need to extend that to cater to Intel NICs. Expect NICs to allocate the info structure dynamically and link to the state from there. All the shared NICs will record port offload information in the one instance of the table so we need to make sure that the use count can accommodate larger numbers. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 12:50:12 -07:00
Nikolay Aleksandrov	f2f3729fb6	net: bridge: fdb: don't flush ext_learn entries When a user-space software manages fdb entries externally it should set the ext_learn flag which marks the fdb entry as externally managed and avoids expiring it (they're treated as static fdbs). Unfortunately on events where fdb entries are flushed (STP down, netlink fdb flush etc) these fdbs are also deleted automatically by the bridge. That in turn causes trouble for the managing user-space software (e.g. in MLAG setups we lose remote fdb entries on port flaps). These entries are completely externally managed so we should avoid automatically deleting them, the only exception are offloaded entries (i.e. BR_FDB_ADDED_BY_EXT_LEARN + BR_FDB_OFFLOADED). They are flushed as before. Fixes: `eb100e0e24` ("net: bridge: allow to add externally learned entries from user-space") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 12:47:43 -07:00
David S. Miller	a4be47afb0	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2020-09-28 1) Fix a build warning in ip_vti if CONFIG_IPV6 is not set. From YueHaibing. 2) Restore IPCB on espintcp before handing the packet to xfrm as the information there is still needed. From Sabrina Dubroca. 3) Fix pmtu updating for xfrm interfaces. From Sabrina Dubroca. 4) Some xfrm state information was not cloned with xfrm_do_migrate. Fixes to clone the full xfrm state, from Antony Antony. 5) Use the correct address family in xfrm_state_find. The struct flowi must always be interpreted along with the original address family. This got lost over the years. Fix from Herbert Xu. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 12:25:42 -07:00
Magnus Karlsson	1fd17c8cd0	xsk: Fix possible crash in socket_release when out-of-memory Fix possible crash in socket_release when an out-of-memory error has occurred in the bind call. If a socket using the XDP_SHARED_UMEM flag encountered an error in xp_create_and_assign_umem, the bind code jumped to the exit routine but erroneously forgot to set the err value before jumping. This meant that the exit routine thought the setup went well and set the state of the socket to XSK_BOUND. The xsk socket release code will then, at application exit, think that this is a properly setup socket, when it is not, leading to a crash when all fields in the socket have in fact not been initialized properly. Fix this by setting the err variable in xsk_bind so that the socket is not set to XSK_BOUND which leads to the clean-up in xsk_release not being triggered. Fixes: `1c1efc2af1` ("xsk: Create and free buffer pool independently from umem") Reported-by: syzbot+ddc7b4944bc61da19b81@syzkaller.appspotmail.com Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/1601112373-10595-1-git-send-email-magnus.karlsson@gmail.com	2020-09-28 21:18:01 +02:00
Rajkumar Manoharan	f5bec330e3	nl80211: extend support to config spatial reuse parameter set Allow the user to configure below Spatial Reuse Parameter Set element. * Non-SRG OBSS PD Max Offset * SRG BSS Color Bitmap * SRG Partial BSSID Bitmap Signed-off-by: Rajkumar Manoharan <rmanohar@codeaurora.org> Link: https://lore.kernel.org/r/1601278091-20313-2-git-send-email-rmanohar@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 15:07:41 +02:00
Ben Greear	265a070833	mac80211: Support not iterating over not-sdata-in-driver ifaces Allow drivers to request that interface-iterator does NOT iterate over interfaces that are not sdata-in-driver. This will allow us to fix crashes in ath10k (and possibly other drivers). To summarize Johannes' explanation: Consider add interface wlan0 add interface wlan1 iterate active interfaces -> wlan0 wlan1 add interface wlan2 iterate active interfaces -> wlan0 wlan1 wlan2 If you apply this scenario to a restart, which ought to be functionally equivalent to the normal startup, just compressed in time, you're basically saying that today you get add interface wlan0 add interface wlan1 iterate active interfaces -> wlan0 wlan1 wlan2 << problem here add interface wlan2 iterate active interfaces -> wlan0 wlan1 wlan2 which yeah, totally seems wrong. But fixing that to be add interface wlan0 add interface wlan1 iterate active interfaces -> <nothing> add interface wlan2 iterate active interfaces -> <nothing> (or maybe -> wlan0 wlan1 wlan2 if the reconfig already completed) This is also at least somewhat wrong, but better to not iterate over something that exists in the driver than iterate over something that does not. Originally the first issue was causing crashes in testing with lots of station vdevs on an ath10k radio, combined with firmware crashing. I ran with a similar patch for years with no obvious bad results, including significant testing with ath9k and ath10k. Signed-off-by: Ben Greear <greearb@candelatech.com> Link: https://lore.kernel.org/r/20200922191957.25257-1-greearb@candelatech.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 15:05:53 +02:00
Rajkumar Manoharan	6c8b6e4a5f	nl80211: fix OBSS PD min and max offset validation The SRG min and max offset won't present when SRG Information Present of SR control field of Spatial Reuse Parameter Set element set to 0. Per spec. IEEE802.11ax D7.0, SRG OBSS PD Min Offset ≤ SRG OBSS PD Max Offset. Hence fix the constrain check to allow same values in both offset and also call appropriate nla_get function to read the values. Fixes: `796e90f42b` ("cfg80211: add support for parsing OBBS_PD attributes") Signed-off-by: Rajkumar Manoharan <rmanohar@codeaurora.org> Link: https://lore.kernel.org/r/1601278091-20313-1-git-send-email-rmanohar@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 15:05:46 +02:00
Johannes Berg	21439b652b	mac80211: fix some more kernel-doc in mesh Add a few more missing kernel-doc annotations in mesh code. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200928135129.6409460c28b7.I43657d0b70398723e59e4e724f56af88af0fec7e@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 14:36:53 +02:00
Dan Carpenter	735b267394	cfg80211: regulatory: remove a bogus initialization The the __freq_reg_info() never returns NULL and the callers don't check for NULL. This initialization to set "reg_rule = NULL;" is just there to make GCC happy but it's not required in current GCCs. The problem is that Smatch sees the initialization and concludes that this function can return NULL so it complains that the callers are not checking for it. Smatch used to be able to parse this correctly but we recently changed the code from: - for (bw = MHZ_TO_KHZ(20); bw >= min_bw; bw = bw / 2) { + for (bw = MHZ_TO_KHZ(bws[i]); bw >= min_bw; bw = MHZ_TO_KHZ(bws[i--])) { Originally Smatch used to understand that this code always iterates through the loop once, but the change from "MHZ_TO_KHZ(20)" to "MHZ_TO_KHZ(bws[i])" is too complicated for Smatch. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20200923084203.GC1454948@mwanda Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 14:20:58 +02:00
Felix Fietkau	e3f25908b0	mac80211: fix regression in sta connection monitor When a frame was acked and probe frames were sent, the connection monitoring needs to be reset, otherwise it will keep probing until the connection is considered dead, even though frames have been acked in the mean time. Fixes: `9abf4e4983` ("mac80211: optimize station connection monitor") Reported-by: Georgi Valkov <gvalkov@abv.bg> Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200927105605.97954-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 14:19:55 +02:00
Thomas Pedersen	58ef7c1b55	nl80211: include frequency offset in survey info Recently channels gained a potential frequency offset, so include this in the per-channel survey info. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200922022818.15855-16-thomas@adapt-ip.com [add the offset only if non-zero] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 14:09:52 +02:00
Thomas Pedersen	1d00ce807e	mac80211: support S1G association The changes required for associating in S1G are: - apply S1G BSS channel info before assoc - mark all S1G STAs as QoS STAs - include and parse AID request element - handle new Association Response format - don't fail assoc if supported rates element is missing Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200922022818.15855-15-thomas@adapt-ip.com [pass skb to ieee80211_add_aid_request_ie(), remove unused variable 'bss'] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 14:09:07 +02:00
Thomas Pedersen	09a740ce35	mac80211: receive and process S1G beacons S1G beacons are 802.11 Extension Frames, so the fixed header part differs from regular beacons. Add a handler to process S1G beacons and abstract out the fetching of BSSID and element start locations in the beacon body handler. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200922022818.15855-14-thomas@adapt-ip.com [don't rename, small coding style cleanups] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 14:01:00 +02:00
Thomas Pedersen	cac8c526ae	mac80211: avoid rate init for S1G band minstrel_ht is confused by the lack of sband->bitrates, and S1G will likely require a unique RC algorithm, so avoid rate init for now if STA is on the S1G band. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200922022818.15855-13-thomas@adapt-ip.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 13:57:24 +02:00
Thomas Pedersen	1821f8b36f	mac80211: handle S1G low rates S1G doesn't have legacy (sband->bitrates) rates, only MCS. For now, just send a frame at MCS 0 if a low rate is requested. Note we also redefine (since we're out of TX flags) TX_RC_VHT_MCS as TX_RC_S1G_MCS to indicate an S1G MCS. This is probably OK as VHT MCS is not valid on S1G band and vice versa. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200922022818.15855-12-thomas@adapt-ip.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 13:56:37 +02:00
Thomas Pedersen	89b8c02a35	mac80211: don't calculate duration for S1G For now just skip the duration calculation for frames transmitted on the S1G band and avoid a warning. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200922022818.15855-11-thomas@adapt-ip.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 13:56:28 +02:00
Thomas Pedersen	05d109576a	mac80211: encode listen interval for S1G S1G allows listen interval up to 2^14 * 10000 beacon intervals. In order to do this listen interval needs a scaling factor applied to the lower 14 bits. Calculate this and properly encode the listen interval for S1G STAs. See IEEE802.11ah-2016 Table 9-44a for reference. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200922022818.15855-10-thomas@adapt-ip.com [move listen_int_usf into function using it] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 13:55:58 +02:00
Thomas Pedersen	80ca257113	cfg80211: handle Association Response from S1G STA The sending STA type is implicit based on beacon or probe response content. If sending STA was an S1G STA, adjust the Information Element location accordingly. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200922022818.15855-9-thomas@adapt-ip.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 13:54:03 +02:00
Thomas Pedersen	cd418ba63f	mac80211: convert S1G beacon to scan results This commit finds the correct offset for Information Elements in S1G beacon frames so they can be reported in scan results. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200922022818.15855-8-thomas@adapt-ip.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 13:53:25 +02:00
Thomas Pedersen	66b0564d7e	cfg80211: parse S1G Operation element for BSS channel Extract the BSS primary channel from the S1G Operation element. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200922022818.15855-7-thomas@adapt-ip.com [remove the goto bits] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 13:53:05 +02:00
Thomas Pedersen	9eaffe5078	cfg80211: convert S1G beacon to scan results The S1G beacon is an extension frame as opposed to management frame for the regular beacon. This means we may have to occasionally cast the frame buffer to a different header type. Luckily this isn't too bad as scan results mostly only care about the IEs. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200922022818.15855-6-thomas@adapt-ip.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 13:53:05 +02:00
Thomas Pedersen	7957c6c814	mac80211: support S1G STA capabilities Include the S1G Capabilities element in an association request, and support the cfg80211 capability overrides. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200922022818.15855-5-thomas@adapt-ip.com [pass skb to ieee80211_add_s1g_capab_ie(), small code style edits] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 13:53:05 +02:00
Thomas Pedersen	d2b7588a47	nl80211: support S1G capability overrides in assoc NL80211_ATTR_S1G_CAPABILITY can be passed along with NL80211_ATTR_S1G_CAPABILITY_MASK to NL80211_CMD_ASSOCIATE to indicate S1G capabilities which should override the hardware capabilities in eg. the association request. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200922022818.15855-4-thomas@adapt-ip.com [johannes: always require both attributes together, commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 13:53:05 +02:00
Thomas Pedersen	75b1593533	mac80211: s1g: choose scanning width based on frequency An S1G BSS can beacon at either 1 or 2 MHz and the channel width is unique to a given frequency. Ignore scan channel width for now and use the allowed channel width. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200922022818.15855-3-thomas@adapt-ip.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 13:53:05 +02:00
Thomas Pedersen	5e48077498	mac80211: get correct default channel width for S1G When deleting a channel context, mac80211 would assing NL80211_CHAN_WIDTH_20_NOHT as the default channel width. This is wrong in S1G however, so instead get the allowed channel width for a given channel. Fixes eg. configuring strange (20Mhz) width during a scan on the S1G band. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200922022818.15855-2-thomas@adapt-ip.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 13:53:05 +02:00
Johannes Berg	211f204159	wireless: radiotap: fix some kernel-doc The vendor namespaces argument isn't described here, add it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200924192511.2bf5cc761d3a.I9b4579ab3eebe3d7889b59eea8fa50d683611bab@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 13:53:05 +02:00
Johannes Berg	f0daf54f4e	mac80211: fix some missing kernel-doc Two parameters are not described, add them. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200924192511.21411377b0a8.I1add147d782a3bf38287bde412dc98f69323c732@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 13:53:04 +02:00
Tova Mussai	c8cb5b854b	nl80211/cfg80211: support 6 GHz scanning Support 6 GHz scanning, by * a new scan flag to scan for colocated BSSes advertised by (and found) APs on 2.4 & 5 GHz * doing the necessary reduced neighbor report parsing for this, to find them * adding the ability to split the scan request in case the device by itself cannot support this. Also add some necessary bits in mac80211 to not break with these changes. Signed-off-by: Tova Mussai <tova.mussai@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200918113313.232917c93af9.Ida22f0212f9122f47094d81659e879a50434a6a2@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 13:53:04 +02:00
Loic Poulain	dba0491f00	mac80211: Inform AP when returning operating channel Because we can miss AP wakeup (beacon) while scanning other channels, it's better go into wakeup state and inform the AP of that upon returning to the operating channel, rather than staying asleep and waiting for the next TIM indicating traffic for us. This saves precious time, especially when we only have 200ms inter- scan period for monitoring the active connection. Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Link: https://lore.kernel.org/r/1593420923-26668-1-git-send-email-loic.poulain@linaro.org [rewrite commit message a bit] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-09-28 13:18:53 +02:00
Florian Fainelli	0675c285ea	net: vlan: Fixed signedness in vlan_group_prealloc_vid() After commit `d0186842ec` ("net: vlan: Avoid using BUG() in vlan_proto_idx()"), vlan_proto_idx() was changed to return a signed integer, however one of its called: vlan_group_prealloc_vid() was still using an unsigned integer for its return value, fix that. Fixes: `d0186842ec` ("net: vlan: Avoid using BUG() in vlan_proto_idx()") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 00:51:39 -07:00
Vladimir Oltean	300fd579b2	net: dsa: tag_rtl4_a: use the generic flow dissector procedure Remove the .flow_dissect procedure, so the flow dissector will call the generic variant which works for this tagging protocol. Cc: Linus Walleij <linus.walleij@linaro.org> Cc: DENG Qingfang <dqfext@gmail.com> Cc: Mauri Sandberg <sandberg@mailfence.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:59 -07:00
Vladimir Oltean	e665297983	net: dsa: tag_sja1105: use a custom flow dissector procedure The sja1105 is a bit of a special snowflake, in that not all frames are transmitted/received in the same way. L2 link-local frames are received with the source port/switch ID information put in the destination MAC address. For the rest, a tag_8021q header is used. So only the latter frames displace the rest of the headers and need to use the generic flow dissector procedure. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:59 -07:00
Vladimir Oltean	6b04f171dc	net: dsa: tag_qca: use the generic flow dissector procedure Remove the .flow_dissect procedure, so the flow dissector will call the generic variant which works for this tagging protocol. Cc: John Crispin <john@phrozen.org> Cc: Alexander Lobakin <alobakin@pm.me> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:59 -07:00
Vladimir Oltean	b1af365637	net: dsa: tag_mtk: use the generic flow dissector procedure Remove the .flow_dissect procedure, so the flow dissector will call the generic variant which works for this tagging protocol. Cc: DENG Qingfang <dqfext@gmail.com> Cc: Sean Wang <sean.wang@mediatek.com> Cc: John Crispin <john@phrozen.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:59 -07:00
Vladimir Oltean	742b2e1951	net: dsa: tag_edsa: use the generic flow dissector procedure Remove the .flow_dissect procedure, so the flow dissector will call the generic variant which works for this tagging protocol. Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:59 -07:00
Vladimir Oltean	11f5011189	net: dsa: tag_dsa: use the generic flow dissector procedure Remove the .flow_dissect procedure, so the flow dissector will call the generic variant which works for this tagging protocol. Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:59 -07:00
Vladimir Oltean	f569ad5257	net: dsa: tag_brcm: use generic flow dissector procedure There are 2 Broadcom tags in use, one places the DSA tag before the Ethernet destination MAC address, and the other before the EtherType. Nonetheless, both displace the rest of the headers, so this tagger can use the generic flow dissector procedure which accounts for that. The ASCII art drawing is a good reference though, so keep it but move it somewhere else. Cc: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:59 -07:00
Vladimir Oltean	54fec33582	net: flow_dissector: avoid indirect call to DSA .flow_dissect for generic case With the recent mitigations against speculative execution exploits, indirect function calls are more expensive and it would be good to avoid them where possible. In the case of DSA, most switch taggers will shift the EtherType and next headers by a fixed amount equal to that tag's length in bytes. So we can use a generic procedure to determine that, without calling into custom tagger code. However we still leave the flow_dissect method inside struct dsa_device_ops as an override for the generic function. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:59 -07:00
Vladimir Oltean	7a6ffe764b	net: dsa: point out the tail taggers The Marvell 88E6060 uses tag_trailer.c and the KSZ8795, KSZ9477 and KSZ9893 switches also use tail tags. Tell that to the DSA core, since this makes a difference for the flow dissector. Most switches break the parsing of frame headers, but these ones don't, so no flow dissector adjustment needs to be done for them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:59 -07:00
Vladimir Oltean	2e8cb1b3db	net: dsa: make the .flow_dissect tagger callback return void There is no tagger that returns anything other than zero, so just change the return type appropriately. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:59 -07:00
Vladimir Oltean	5124197ce5	net: dsa: tag_ocelot: use a short prefix on both ingress and egress There are 2 goals that we follow: - Reduce the header size - Make the header size equal between RX and TX The issue that required long prefix on RX was the fact that the ocelot DSA tag, being put before Ethernet as it is, would overlap with the area that a DSA master uses for RX filtering (destination MAC address mainly). Now that we can ask DSA to put the master in promiscuous mode, in theory we could remove the prefix altogether and call it a day, but it looks like we can't. Using no prefix on ingress, some packets (such as ICMP) would be received, while others (such as PTP) would not be received. This is because the DSA master we use (enetc) triggers parse errors ("MAC rx frame errors") presumably because it sees Ethernet frames with a bad length. And indeed, when using no prefix, the EtherType (bytes 12-13 of the frame, bits 96-111) falls over the REW_VAL field from the extraction header, aka the PTP timestamp. When turning the short (32-bit) prefix on, the EtherType overlaps with bits 64-79 of the extraction header, which are a reserved area transmitted as zero by the switch. The packets are not dropped by the DSA master with a short prefix. Actually, the frames look like this in tcpdump (below is a PTP frame, with an extra dsa_8021q tag - dadb 0482 - added by a downstream sja1105). 89:0c:a9:f2:01:00 > 88:80:00:0a:00:1d, 802.3, length 0: LLC, \ dsap Unknown (0x10) Individual, ssap ProWay NM (0x0e) Response, \ ctrl 0x0004: Information, send seq 2, rcv seq 0, \ Flags [Response], length 78 0x0000: 8880 000a 001d 890c a9f2 0100 0000 100f ................ 0x0010: 0400 0000 0180 c200 000e 001f 7b63 0248 ............{c.H 0x0020: dadb 0482 88f7 1202 0036 0000 0000 0000 .........6...... 0x0030: 0000 0000 0000 0000 0000 001f 7bff fe63 ............{..c 0x0040: 0248 0001 1f81 0500 0000 0000 0000 0000 .H.............. 0x0050: 0000 0000 0000 0000 0000 0000 ............ So the short prefix is our new default: we've shortened our RX frames by 12 octets, increased TX by 4, and headers are now equal between RX and TX. Note that we still need promiscuous mode for the DSA master to not drop it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:58 -07:00
Vladimir Oltean	707091eb26	net: dsa: tag_sja1105: request promiscuous mode for master Currently PTP is broken when ports are in standalone mode (the tagger keeps printing this message): sja1105 spi0.1: Expected meta frame, is 01-80-c2-00-00-0e in the DSA master multicast filter? Sure, one might say "simply add 01-80-c2-00-00-0e to the master's RX filter" but things become more complicated because: - Actually all frames in the 01-80-c2-xx-xx-xx and 01-1b-19-xx-xx-xx range are trapped to the CPU automatically - The switch mangles bytes 3 and 4 of the MAC address via the incl_srcpt ("include source port [in the DMAC]") option, which is how source port and switch id identification is done for link-local traffic on RX. But this means that an address installed to the RX filter would, at the end of the day, not correspond to the final address seen by the DSA master. Assume RX filtering lists on DSA masters are typically too small to include all necessary addresses for PTP to work properly on sja1105, and just request promiscuous mode unconditionally. Just an example: Assuming the following addresses are trapped to the CPU: 01-80-c2-00-00-00 to 01-80-c2-00-00-ff 01-1b-19-00-00-00 to 01-1b-19-00-00-ff These are 512 addresses. Now let's say this is a board with 3 switches, and 4 ports per switch. The 512 addresses become 6144 addresses that must be managed by the DSA master's RX filtering lists. This may be refined in the future, but for now, it is simply not worth it to add the additional addresses to the master's RX filter, so simply request it to become promiscuous as soon as the driver probes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:58 -07:00
Vladimir Oltean	c3975400c8	net: dsa: allow drivers to request promiscuous mode on master Currently DSA assumes that taggers don't mess with the destination MAC address of the frames on RX. That is not always the case. Some DSA headers are placed before the Ethernet header (ocelot), and others simply mangle random bytes from the destination MAC address (sja1105 with its incl_srcpt option). Currently the DSA master goes to promiscuous mode automatically when the slave devices go too (such as when enslaved to a bridge), but in standalone mode this is a problem that needs to be dealt with. So give drivers the possibility to signal that their tagging protocol will get randomly dropped otherwise, and let DSA deal with fixing that. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-26 14:17:58 -07:00
Jacob Keller	5d5b4128c4	devlink: introduce flash update overwrite mask Sections of device flash may contain settings or device identifying information. When performing a flash update, it is generally expected that these settings and identifiers are not overwritten. However, it may sometimes be useful to allow overwriting these fields when performing a flash update. Some examples include, 1) customizing the initial device config on first programming, such as overwriting default device identifying information, or 2) reverting a device configuration to known good state provided in the new firmware image, or 3) in case it is suspected that current firmware logic for managing the preservation of fields during an update is broken. Although some devices are able to completely separate these types of settings and fields into separate components, this is not true for all hardware. To support controlling this behavior, a new DEVLINK_ATTR_FLASH_UPDATE_OVERWRITE_MASK is defined. This is an nla_bitfield32 which will define what subset of fields in a component should be overwritten during an update. If no bits are specified, or of the overwrite mask is not provided, then an update should not overwrite anything, and should maintain the settings and identifiers as they are in the previous image. If the overwrite mask has the DEVLINK_FLASH_OVERWRITE_SETTINGS bit set, then the device should be configured to overwrite any of the settings in the requested component with settings found in the provided image. Similarly, if the DEVLINK_FLASH_OVERWRITE_IDENTIFIERS bit is set, the device should be configured to overwrite any device identifiers in the requested component with the identifiers from the image. Multiple overwrite modes may be combined to indicate that a combination of the set of fields that should be overwritten. Drivers which support the new overwrite mask must set the DEVLINK_SUPPORT_FLASH_UPDATE_OVERWRITE_MASK in the supported_flash_update_params field of their devlink_ops. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-25 17:20:57 -07:00
Jacob Keller	bc75c054f0	devlink: convert flash_update to use params structure The devlink core recently gained support for checking whether the driver supports a flash_update parameter, via `supported_flash_update_params`. However, parameters are specified as function arguments. Adding a new parameter still requires modifying the signature of the .flash_update callback in all drivers. Convert the .flash_update function to take a new `struct devlink_flash_update_params` instead. By using this structure, and the `supported_flash_update_params` bit field, a new parameter to flash_update can be added without requiring modification to existing drivers. As before, all parameters except file_name will require driver opt-in. Because file_name is a necessary field to for the flash_update to make sense, no "SUPPORTED" bitflag is provided and it is always considered valid. All future additional parameters will require a new bit in the supported_flash_update_params bitfield. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Cc: Jiri Pirko <jiri@mellanox.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Chan <michael.chan@broadcom.com> Cc: Bin Luo <luobin9@huawei.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Ido Schimmel <idosch@mellanox.com> Cc: Danielle Ratson <danieller@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-25 17:20:57 -07:00
Jacob Keller	22ec3d232f	devlink: check flash_update parameter support in net core When implementing .flash_update, drivers which do not support per-component update are manually checking the component parameter to verify that it is NULL. Without this check, the driver might accept an update request with a component specified even though it will not honor such a request. Instead of having each driver check this, move the logic into net/core/devlink.c, and use a new `supported_flash_update_params` field in the devlink_ops. Drivers which will support per-component update must now specify this by setting DEVLINK_SUPPORT_FLASH_UPDATE_COMPONENT in the supported_flash_update_params in their devlink_ops. This helps ensure that drivers do not forget to check for a NULL component if they do not support per-component update. This also enables a slightly better error message by enabling the core stack to set the netlink bad attribute message to indicate precisely the unsupported attribute in the message. Going forward, any new additional parameter to flash update will require a bit in the supported_flash_update_params bitfield. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Cc: Jiri Pirko <jiri@mellanox.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Chan <michael.chan@broadcom.com> Cc: Bin Luo <luobin9@huawei.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Ido Schimmel <idosch@mellanox.com> Cc: Danielle Ratson <danieller@mellanox.com> Cc: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-25 17:20:57 -07:00
Yuchung Cheng	534a2109fb	tcp: consolidate tcp_mark_skb_lost and tcp_skb_mark_lost tcp_skb_mark_lost is used by RFC6675-SACK and can easily be replaced with the new tcp_mark_skb_lost handler. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-25 17:17:14 -07:00
Yuchung Cheng	686989700c	tcp: simplify tcp_mark_skb_lost This patch consolidates and simplifes the loss marking logic used by a few loss detections (RACK, RFC6675, NewReno). Previously each detection uses a subset of several intertwined subroutines. This unncessary complexity has led to bugs (and fixes of bug fixes). tcp_mark_skb_lost now is the single one routine to mark a packet loss when a loss detection caller deems an skb ist lost: 1. rewind tp->retransmit_hint_skb if skb has lower sequence or all lost ones have been retransmitted. 2. book-keeping: adjust flags and counts depending on if skb was retransmitted or not. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-25 17:17:14 -07:00
Yuchung Cheng	fd2146741c	tcp: move tcp_mark_skb_lost A pure refactor to move tcp_mark_skb_lost to tcp_input.c to prepare for the later loss marking consolidation. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-25 17:17:14 -07:00
Yuchung Cheng	179ac35f2f	tcp: consistently check retransmit hint tcp_simple_retransmit() used for path MTU discovery may not adjust the retransmit hint properly by deducting retrans_out before checking it to adjust the hint. This patch fixes this by a correct routine tcp_mark_skb_lost() already used by the RACK loss detection. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-25 17:17:14 -07:00
Nikolay Aleksandrov	7470558240	net: bridge: mcast: remove only S,G port groups from sg_port hash We should remove a group from the sg_port hash only if it's an S,G entry. This makes it correct and more symmetric with group add. Also since *,G groups are not added to that hash we can hide a bug. Fixes: `085b53c8be` ("net: bridge: mcast: add sg_port rhashtable") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-25 16:50:19 -07:00
J. Bruce Fields	0aa99c4d1f	sunrpc: simplify do_cache_clean Is it just me, or is the logic written in a slightly convoluted way? I find it a little easier to read this way. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2020-09-25 18:02:02 -04:00
Xu Wang	9dbc1f45d5	sunrpc: cache : Replace seq_printf with seq_puts seq_puts is a lot cheaper than seq_printf, so use that to print literal strings. Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2020-09-25 18:01:28 -04:00
Anna Schumaker	403217f304	SUNRPC/NFSD: Implement xdr_reserve_space_vec() Reserving space for a large READ payload requires special handling when reserving space in the xdr buffer pages. One problem we can have is use of the scratch buffer, which is used to get a pointer to a contiguous region of data up to PAGE_SIZE. When using the scratch buffer, calls to xdr_commit_encode() shift the data to it's proper alignment in the xdr buffer. If we've reserved several pages in a vector, then this could potentially invalidate earlier pointers and result in incorrect READ data being sent to the client. I get around this by looking at the amount of space left in the current page, and never reserve more than that for each entry in the read vector. This lets us place data directly where it needs to go in the buffer pages. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2020-09-25 18:01:27 -04:00
Randy Dunlap	1cc5213baa	net: sunrpc: delete repeated words Drop duplicate words in net/sunrpc/. Also fix "Anyone" to be "Any one". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2020-09-25 18:01:26 -04:00
Florian Fainelli	d0186842ec	net: vlan: Avoid using BUG() in vlan_proto_idx() While we should always make sure that we specify a valid VLAN protocol to vlan_proto_idx(), killing the machine when an invalid value is specified is too harsh and not helpful for debugging. All callers are capable of dealing with an error returned by vlan_proto_idx() so check the index value and propagate it accordingly. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-25 14:12:18 -07:00
Martin KaFai Lau	27e5203bd9	bpf: Change bpf_sk_assign to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON This patch changes the bpf_sk_assign() to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer returned by the bpf_skc_to_*() helpers also. The bpf_sk_lookup_assign() is taking ARG_PTR_TO_SOCKET_"OR_NULL". Meaning it specifically takes a literal NULL. ARG_PTR_TO_BTF_ID_SOCK_COMMON does not allow a literal NULL, so another ARG type is required for this purpose and another follow-up patch can be used if there is such need. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200925000415.3857374-1-kafai@fb.com	2020-09-25 13:58:02 -07:00
Martin KaFai Lau	c0df236e13	bpf: Change bpf_tcp__syncookie to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON This patch changes the bpf_tcp__syncookie() to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer returned by the bpf_skc_to_*() helpers also. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Lorenz Bauer <lmb@cloudflare.com> Link: https://lore.kernel.org/bpf/20200925000409.3856725-1-kafai@fb.com	2020-09-25 13:58:01 -07:00
Martin KaFai Lau	592a349864	bpf: Change bpf_sk_storage_() to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON This patch changes the bpf_sk_storage_() to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer returned by the bpf_skc_to_*() helpers also. A micro benchmark has been done on a "cgroup_skb/egress" bpf program which does a bpf_sk_storage_get(). It was driven by netperf doing a 4096 connected UDP_STREAM test with 64bytes packet. The stats from "kernel.bpf_stats_enabled" shows no meaningful difference. The sk_storage_get_btf_proto, sk_storage_delete_btf_proto, btf_sk_storage_get_proto, and btf_sk_storage_delete_proto are no longer needed, so they are removed. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Lorenz Bauer <lmb@cloudflare.com> Link: https://lore.kernel.org/bpf/20200925000402.3856307-1-kafai@fb.com	2020-09-25 13:58:01 -07:00
Martin KaFai Lau	a5fa25adf0	bpf: Change bpf_sk_release and bpf_sk_cgroup_id to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON The previous patch allows the networking bpf prog to use the bpf_skc_to_() helpers to get a PTR_TO_BTF_ID socket pointer, e.g. "struct tcp_sock ". It allows the bpf prog to read all the fields of the tcp_sock. This patch changes the bpf_sk_release() and bpf_sk_cgroup_id() to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer returned by the bpf_skc_to_() helpers also. For example, the following will work: sk = bpf_skc_lookup_tcp(skb, tuple, tuplen, BPF_F_CURRENT_NETNS, 0); if (!sk) return; tp = bpf_skc_to_tcp_sock(sk); if (!tp) { bpf_sk_release(sk); return; } lsndtime = tp->lsndtime; / Pass tp to bpf_sk_release() will also work / bpf_sk_release(tp); Since PTR_TO_BTF_ID could be NULL, the helper taking ARG_PTR_TO_BTF_ID_SOCK_COMMON has to check for NULL at runtime. A btf_id of "struct sock" may not always mean a fullsock. Regardless the helper's running context may get a non-fullsock or not, considering fullsock check/handling is pretty cheap, it is better to keep the same verifier expectation on helper that takes ARG_PTR_TO_BTF_ID will be able to handle the minisock situation. In the bpf_sk_cgroup_id() case, it will try to get a fullsock by using sk_to_full_sk() as its skb variant bpf_sk"b"_cgroup_id() has already been doing. bpf_sk_release can already handle minisock, so nothing special has to be done. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200925000356.3856047-1-kafai@fb.com	2020-09-25 13:58:01 -07:00
Martin KaFai Lau	1df8f55a37	bpf: Enable bpf_skc_to_* sock casting helper to networking prog type There is a constant need to add more fields into the bpf_tcp_sock for the bpf programs running at tc, sock_ops...etc. A current workaround could be to use bpf_probe_read_kernel(). However, other than making another helper call for reading each field and missing CO-RE, it is also not as intuitive to use as directly reading "tp->lsndtime" for example. While already having perfmon cap to do bpf_probe_read_kernel(), it will be much easier if the bpf prog can directly read from the tcp_sock. This patch tries to do that by using the existing casting-helpers bpf_skc_to_() whose func_proto returns a btf_id. For example, the func_proto of bpf_skc_to_tcp_sock returns the btf_id of the kernel "struct tcp_sock". These helpers are also added to is_ptr_cast_function(). It ensures the returning reg (BPF_REF_0) will also carries the ref_obj_id. That will keep the ref-tracking works properly. The bpf_skc_to_ helpers are made available to most of the bpf prog types in filter.c. The bpf_skc_to_* helpers will be limited by perfmon cap. This patch adds a ARG_PTR_TO_BTF_ID_SOCK_COMMON. The helper accepting this arg can accept a btf-id-ptr (PTR_TO_BTF_ID + &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON]) or a legacy-ctx-convert-skc-ptr (PTR_TO_SOCK_COMMON). The bpf_skc_to_() helpers are changed to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will accept pointer obtained from skb->sk. Instead of specifying both arg_type and arg_btf_id in the same func_proto which is how the current ARG_PTR_TO_BTF_ID does, the arg_btf_id of the new ARG_PTR_TO_BTF_ID_SOCK_COMMON is specified in the compatible_reg_types[] in verifier.c. The reason is the arg_btf_id is always the same. Discussion in this thread: https://lore.kernel.org/bpf/20200922070422.1917351-1-kafai@fb.com/ The ARG_PTR_TO_BTF_ID_ part gives a clear expectation that the helper is expecting a PTR_TO_BTF_ID which could be NULL. This is the same behavior as the existing helper taking ARG_PTR_TO_BTF_ID. The _SOCK_COMMON part means the helper is also expecting the legacy SOCK_COMMON pointer. By excluding the _OR_NULL part, the bpf prog cannot call helper with a literal NULL which doesn't make sense in most cases. e.g. bpf_skc_to_tcp_sock(NULL) will be rejected. All PTR_TO__OR_NULL reg has to do a NULL check first before passing into the helper or else the bpf prog will be rejected. This behavior is nothing new and consistent with the current expectation during bpf-prog-load. [ ARG_PTR_TO_BTF_ID_SOCK_COMMON will be used to replace ARG_PTR_TO_SOCK* of other existing helpers later such that those existing helpers can take the PTR_TO_BTF_ID returned by the bpf_skc_to_() helpers. The only special case is bpf_sk_lookup_assign() which can accept a literal NULL ptr. It has to be handled specially in another follow up patch if there is a need (e.g. by renaming ARG_PTR_TO_SOCKET_OR_NULL to ARG_PTR_TO_BTF_ID_SOCK_COMMON_OR_NULL). ] [ When converting the older helpers that take ARG_PTR_TO_SOCK in the later patch, if the kernel does not support BTF, ARG_PTR_TO_BTF_ID_SOCK_COMMON will behave like ARG_PTR_TO_SOCK_COMMON because no reg->type could have PTR_TO_BTF_ID in this case. It is not a concern for the newer-btf-only helper like the bpf_skc_to_*() here though because these helpers must require BTF vmlinux to begin with. ] Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200925000350.3855720-1-kafai@fb.com	2020-09-25 13:58:01 -07:00
Luiz Augusto von Dentz	b560a208cd	Bluetooth: MGMT: Fix not checking if BT_HS is enabled This checks if BT_HS is enabled relecting it on MGMT_SETTING_HS instead of always reporting it as supported. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-09-25 20:21:55 +02:00
Luiz Augusto von Dentz	b176dd0ef6	Bluetooth: Disable High Speed by default Bluetooth High Speed requires hardware support which is very uncommon nowadays since HS has not pickup interest by the industry. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-09-25 20:21:55 +02:00
Luiz Augusto von Dentz	f19425641c	Bluetooth: L2CAP: Fix calling sk_filter on non-socket based channel Only sockets will have the chan->data set to an actual sk, channels like A2MP would have its own data which would likely cause a crash when calling sk_filter, in order to fix this a new callback has been introduced so channels can implement their own filtering if necessary. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-09-25 20:21:55 +02:00
Luiz Augusto von Dentz	eddb773211	Bluetooth: A2MP: Fix not initializing all members This fixes various places where a stack variable is used uninitialized. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-09-25 20:21:55 +02:00
Linus Torvalds	6d28cf7dfe	Fixes: - Incorrect calculation on platforms that implement flush_dcache_page() -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAl9otekACgkQM2qzM29m f5cArhAApdYYg8Y7gkRsh++qWgidQXc+P/Orcu8V866GqKFbBlFflDTt13/isbhq P545F7tFLKIVuFGP06Yg+L2xfFoTCG8nE5oPT1F51vQvIFgStRNk/Sh9CtmMiVIf bdNtGdAUlyYKJOLuLYiZwqwY0GFbdJ1dsKsy1Vm8tuVRyvHJyjre9GLVE5XKEnQ3 4gqwpG3/V3mrr3Pd78cTbgZpl5nk2cNBYEaJEL0D6tpHQMjS5GRApvRt70WSt2zn WKJIP/fHVC+JHFAAUTsO7oa3ZvQC9PZPFRrKAKdr7CLfISzh5jyLSKdO/Rxqf4G5 wIxGfF2KG8AfPnNd0KqnBxKdi5zr/4/NmvI4bI1GmqL8ViTcqJJMpcZBknWgsH4+ 9BbEZZLMJ5UlLTBKej/N/4CQYZb3Vmnz7BZcFewBqBELvmfnAxQSuMonfvDv0VSw DcPcr/h8U5xrBUFtMcj38O1+nSfdMichMwIwxfBhb+rIRmTgDYFWJXeCAPAeWjJT MNT+2QXTMCiuBmKli8GPI7qQ0QC9Ska7wGvn3IVQ5kkbG7wdNH4swUxsGYjbE8xw eM7mZvQB1FYL8UafEqIH5k4LfeLrsd9kxhhUVdOP8r6yXQ8gGUPa5SlemfQcbe0i tVe8R8V6KIaep++SqQ6Emh9jxQwm1NSOj2xm5WClLOQ+FObqssI= =8a9N -----END PGP SIGNATURE----- Merge tag 'nfsd-5.9-2' of git://git.linux-nfs.org/projects/cel/cel-2.6 Pull NFS server fix from Chuck Lever: "Fix incorrect calculation on platforms that implement flush_dcache_page()" * tag 'nfsd-5.9-2' of git://git.linux-nfs.org/projects/cel/cel-2.6: SUNRPC: Fix svc_flush_dcache()	2020-09-25 10:46:11 -07:00
Sathish Narasimman	c0ee0644df	Bluetooth: Fix update of own_addr_type if ll_privacy supported During system powercycle when trying to get the random address hci_get_random_address set own_addr_type as 0x01. In which if we enable ll_privacy it is supposed to be 0x03. Signed-off-by: Sathish Narasimman <sathish.narasimman@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-09-25 17:53:59 +02:00
Herbert Xu	e94ee17134	xfrm: Use correct address family in xfrm_state_find The struct flowi must never be interpreted by itself as its size depends on the address family. Therefore it must always be grouped with its original family value. In this particular instance, the original family value is lost in the function xfrm_state_find. Therefore we get a bogus read when it's coupled with the wrong family which would occur with inter- family xfrm states. This patch fixes it by keeping the original family value. Note that the same bug could potentially occur in LSM through the xfrm_state_pol_flow_match hook. I checked the current code there and it seems to be safe for now as only secid is used which is part of struct flowi_common. But that API should be changed so that so that we don't get new bugs in the future. We could do that by replacing fl with just secid or adding a family field. Reported-by: syzbot+577fbac3145a6eb2e7a5@syzkaller.appspotmail.com Fixes: `48b8d78315` ("[XFRM]: State selection update to use inner...") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-09-25 09:59:51 +02:00
Florian Westphal	77d0cab939	net: tcp: drop unused function argument from mptcp_incoming_options Since commit `cfde141ea3` ("mptcp: move option parsing into mptcp_incoming_options()"), the 3rd function argument is no longer used. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-24 20:17:01 -07:00
Priyaranjan Jha	ad2b9b0f8d	tcp: skip DSACKs with dubious sequence ranges Currently, we use length of DSACKed range to compute number of delivered packets. And if sequence range in DSACK is corrupted, we can get bogus dsacked/acked count, and bogus cwnd. This patch put bounds on DSACKed range to skip update of data delivery and spurious retransmission information, if the DSACK is unlikely caused by sender's action: - DSACKed range shouldn't be greater than maximum advertised rwnd. - Total no. of DSACKed segments shouldn't be greater than total no. of retransmitted segs. Unlike spurious retransmits, network duplicates or corrupted DSACKs shouldn't be counted as delivery. Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-24 20:15:45 -07:00
Rohit Maheshwari	38f7e1c0c4	net/tls: race causes kernel panic BUG: kernel NULL pointer dereference, address: 00000000000000b8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 80000008b6fef067 P4D 80000008b6fef067 PUD 8b6fe6067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 12 PID: 23871 Comm: kworker/12:80 Kdump: loaded Tainted: G S 5.9.0-rc3+ #1 Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.1 03/29/2018 Workqueue: events tx_work_handler [tls] RIP: 0010:tx_work_handler+0x1b/0x70 [tls] Code: dc fe ff ff e8 16 d4 a3 f6 66 0f 1f 44 00 00 0f 1f 44 00 00 55 53 48 8b 6f 58 48 8b bd a0 04 00 00 48 85 ff 74 1c 48 8b 47 28 <48> 8b 90 b8 00 00 00 83 e2 02 75 0c f0 48 0f ba b0 b8 00 00 00 00 RSP: 0018:ffffa44ace61fe88 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff91da9e45cc30 RCX: dead000000000122 RDX: 0000000000000001 RSI: ffff91da9e45cc38 RDI: ffff91d95efac200 RBP: ffff91da133fd780 R08: 0000000000000000 R09: 000073746e657665 R10: 8080808080808080 R11: 0000000000000000 R12: ffff91dad7d30700 R13: ffff91dab6561080 R14: 0ffff91dad7d3070 R15: ffff91da9e45cc38 FS: 0000000000000000(0000) GS:ffff91dad7d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000b8 CR3: 0000000906478003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: process_one_work+0x1a7/0x370 worker_thread+0x30/0x370 ? process_one_work+0x370/0x370 kthread+0x114/0x130 ? kthread_park+0x80/0x80 ret_from_fork+0x22/0x30 tls_sw_release_resources_tx() waits for encrypt_pending, which can have race, so we need similar changes as in commit `0cada33241` here as well. Fixes: `a42055e8d2` ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-24 20:06:29 -07:00
Geliang Tang	00cfd77b90	mptcp: retransmit ADD_ADDR when timeout This patch implemented the retransmition of ADD_ADDR when no ADD_ADDR echo is received. It added a timer with the announced address. When timeout occurs, ADD_ADDR will be retransmitted. Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-24 19:58:34 -07:00
Geliang Tang	08b81d8731	mptcp: add sk_stop_timer_sync helper This patch added a new helper sk_stop_timer_sync, it deactivates a timer like sk_stop_timer, but waits for the handler to finish. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-24 19:58:34 -07:00
Geliang Tang	0abd40f823	mptcp: add struct mptcp_pm_add_entry Add a new struct mptcp_pm_add_entry to describe add_addr's entry. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-24 19:58:34 -07:00
Geliang Tang	5c8c164095	mptcp: add mptcp_destroy_common helper This patch added a new helper named mptcp_destroy_common containing the shared code between mptcp_destroy() and mptcp_sock_destruct(). Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-24 19:58:34 -07:00
Geliang Tang	7a7e52e38a	mptcp: add RM_ADDR related mibs This patch added two new mibs for RM_ADDR, named MPTCP_MIB_RMADDR and MPTCP_MIB_RMSUBFLOW, when the RM_ADDR suboption is received, increase the first mib counter, when the local subflow is removed, increase the second mib counter. Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Suggested-by: Paolo Abeni <pabeni@redhat.com> Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-24 19:58:34 -07:00
Geliang Tang	0ee4261a36	mptcp: implement mptcp_pm_remove_subflow This patch implemented the local subflow removing function, mptcp_pm_remove_subflow, it simply called mptcp_pm_nl_rm_subflow_received under the PM spin lock. We use mptcp_pm_remove_subflow to remove a local subflow, so change it's argument from remote_id to local_id. We check subflow->local_id in mptcp_pm_nl_rm_subflow_received to remove a subflow. Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Suggested-by: Paolo Abeni <pabeni@redhat.com> Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-24 19:58:34 -07:00
Geliang Tang	b6c0838086	mptcp: remove addr and subflow in PM netlink This patch implements the remove announced addr and subflow logic in PM netlink. When the PM netlink removes an address, we traverse all the existing msk sockets to find the relevant sockets. We add a new list named anno_list in mptcp_pm_data, to record all the announced addrs. In the traversing, we check if it has been recorded. If it has been, we trigger the RM_ADDR signal. We also check if this address is in conn_list. If it is, we remove the subflow which using this local address. Since we call mptcp_pm_free_anno_list in mptcp_destroy, we need to move __mptcp_init_sock before the mptcp_is_enabled check in mptcp_init_sock. Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Suggested-by: Paolo Abeni <pabeni@redhat.com> Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-24 19:58:33 -07:00
Geliang Tang	f58f065aa1	mptcp: add accept_subflow re-check The re-check of pm->accept_subflow with pm->lock held was missing, this patch fixed it. Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-24 19:58:33 -07:00
Geliang Tang	a877de0671	mptcp: add ADD_ADDR related mibs This patch added two mibs for ADD_ADDR, MPTCP_MIB_ADDADDR for receiving of the ADD_ADDR suboption with echo-flag=0, and MPTCP_MIB_ECHOADD for receiving the ADD_ADDR suboption with echo-flag=1. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-24 19:58:33 -07:00
Geliang Tang	6a6c05a8b0	mptcp: send out ADD_ADDR with echo flag When the ADD_ADDR suboption has been received, we need to send out the same ADD_ADDR suboption with echo-flag=1, and no HMAC. Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-24 19:58:33 -07:00
Geliang Tang	d0876b2284	mptcp: add the incoming RM_ADDR support This patch added the RM_ADDR option parsing logic: We parsed the incoming options to find if the rm_addr option is received, and called mptcp_pm_rm_addr_received to schedule PM work to a new status, named MPTCP_PM_RM_ADDR_RECEIVED. PM work got this status, and called mptcp_pm_nl_rm_addr_received to handle it. In mptcp_pm_nl_rm_addr_received, we closed the subflow matching the rm_id, and updated PM counter. Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Suggested-by: Paolo Abeni <pabeni@redhat.com> Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-24 19:58:33 -07:00
Geliang Tang	5cb104ae55	mptcp: add the outgoing RM_ADDR support This patch added a new signal named rm_addr_signal in PM. On outgoing path, we called mptcp_pm_should_rm_signal to check if rm_addr_signal has been set. If it has been, we sent out the RM_ADDR option. Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-24 19:58:33 -07:00
Geliang Tang	f643b8032e	mptcp: rename addr_signal and the related functions This patch renamed addr_signal and the related functions with the explicit word "add". Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-24 19:58:33 -07:00
Vladimir Oltean	e2f9a8fe73	net: mscc: ocelot: always pass skb clone to ocelot_port_add_txtstamp_skb Currently, ocelot switchdev passes the skb directly to the function that enqueues it to the list of skb's awaiting a TX timestamp. Whereas the felix DSA driver first clones the skb, then passes the clone to this queue. This matters because in the case of felix, the common IRQ handler, which is ocelot_get_txtstamp(), currently clones the clone, and frees the original clone. This is useless and can be simplified by using skb_complete_tx_timestamp() instead of skb_tstamp_tx(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-24 19:47:56 -07:00
Cong Wang	0fedc63fad	net_sched: commit action insertions together syzbot is able to trigger a failure case inside the loop in tcf_action_init(), and when this happens we clean up with tcf_action_destroy(). But, as these actions are already inserted into the global IDR, other parallel process could free them before tcf_action_destroy(), then we will trigger a use-after-free. Fix this by deferring the insertions even later, after the loop, and committing all the insertions in a separate loop, so we will never fail in the middle of the insertions any more. One side effect is that the window between alloction and final insertion becomes larger, now it is more likely that the loop in tcf_del_walker() sees the placeholder -EBUSY pointer. So we have to check for error pointer in tcf_del_walker(). Reported-and-tested-by: syzbot+2287853d392e4b42374a@syzkaller.appspotmail.com Fixes: `0190c1d452` ("net: sched: atomically check-allocate action") Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-24 19:46:21 -07:00
Cong Wang	e49d8c22f1	net_sched: defer tcf_idr_insert() in tcf_action_init_1() All TC actions call tcf_idr_insert() for new action at the end of their ->init(), so we can actually move it to a central place in tcf_action_init_1(). And once the action is inserted into the global IDR, other parallel process could free it immediately as its refcnt is still 1, so we can not fail after this, we need to move it after the goto action validation to avoid handling the failure case after insertion. This is found during code review, is not directly triggered by syzbot. And this prepares for the next patch. Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-24 19:46:21 -07:00
Dmitry Safonov	96392ee5a1	xfrm/compat: Translate 32-bit user_policy from sockptr Provide compat_xfrm_userpolicy_info translation for xfrm setsocketopt(). Reallocate buffer and put the missing padding for 64-bit message. Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-09-24 08:53:04 +02:00
Dmitry Safonov	5106f4a8ac	xfrm/compat: Add 32=>64-bit messages translator Provide the user-to-kernel translator under XFRM_USER_COMPAT, that creates for 32-bit xfrm-user message a 64-bit translation. The translation is afterwards reused by xfrm_user code just as if userspace had sent 64-bit message. Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-09-24 08:53:03 +02:00
Dmitry Safonov	e11eb32de3	netlink/compat: Append NLMSG_DONE/extack to frag_list Modules those use netlink may supply a 2nd skb, (via frag_list) that contains an alternative data set meant for applications using 32bit compatibility mode. In such a case, netlink_recvmsg will use this 2nd skb instead of the original one. Without this patch, such compat applications will retrieve all netlink dump data, but will then get an unexpected EOF. Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Dmitry Safonov <dima@arista.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-09-24 08:53:03 +02:00
Dmitry Safonov	5f3eea6b7e	xfrm/compat: Attach xfrm dumps to 64=>32 bit translator Currently nlmsg_unicast() is used by functions that dump structures that can be different in size for compat tasks, see dump_one_state() and dump_one_policy(). The following nlmsg_unicast() users exist today in xfrm: Function \| Message can be different \| in size on compat -------------------------------------------\|------------------------------ xfrm_get_spdinfo() \| N xfrm_get_sadinfo() \| N xfrm_get_sa() \| Y xfrm_alloc_userspi() \| Y xfrm_get_policy() \| Y xfrm_get_ae() \| N Besides, dump_one_state() and dump_one_policy() can be used by filtered netlink dump for XFRM_MSG_GETSA, XFRM_MSG_GETPOLICY. Just as for xfrm multicast, allocate frag_list for compat skb journey down to recvmsg() which will give user the desired skb according to syscall bitness. Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-09-24 08:53:03 +02:00
Dmitry Safonov	5461fc0c8d	xfrm/compat: Add 64=>32-bit messages translator Provide the kernel-to-user translator under XFRM_USER_COMPAT, that creates for 64-bit xfrm-user message a 32-bit translation and puts it in skb's frag_list. net/compat.c layer provides MSG_CMSG_COMPAT to decide if the message should be taken from skb or frag_list. (used by wext-core which has also an ABI difference) Kernel sends 64-bit xfrm messages to the userspace for: - multicast (monitor events) - netlink dumps Wire up the translator to xfrm_nlmsg_multicast(). Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-09-24 08:53:03 +02:00
Dmitry Safonov	c9e7c76d70	xfrm: Provide API to register translator module Add a skeleton for xfrm_compat module and provide API to register it in xfrm_state.ko. struct xfrm_translator will have function pointers to translate messages received from 32-bit userspace or to be sent to it from 64-bit kernel. module_get()/module_put() are used instead of rcu_read_lock() as the module will vmalloc() memory for translation. The new API is registered with xfrm_state module, not with xfrm_user as the former needs translator for user_policy set by setsockopt() and xfrm_user already uses functions from xfrm_state. Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-09-24 08:53:03 +02:00
Florian Fainelli	ed409f3bba	net: dsa: b53: Configure VLANs while not filtering Update the B53 driver to support VLANs while not filtering. This requires us to enable VLAN globally within the switch upon driver initial configuration (dev->vlan_enabled). We also need to remove the code that dealt with PVID re-configuration in b53_vlan_filtering() since that function worked under the assumption that it would only be called to make a bridge VLAN filtering, or not filtering, and we would attempt to move the port's PVID accordingly. Now that VLANs are programmed all the time, even in the case of a non-VLAN filtering bridge, we would be programming a default_pvid for the bridged switch ports. We need the DSA receive path to pop the VLAN tag if it is the bridge's default_pvid because the CPU port is always programmed tagged in the programmed VLANs. In order to do so we utilize the dsa_untag_bridge_pvid() helper introduced in the commit before within net/dsa/tag_brcm.c. Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 18:13:45 -07:00
Vladimir Oltean	412a1526d0	net: dsa: untag the bridge pvid from rx skbs Currently the bridge untags VLANs present in its VLAN groups in __allowed_ingress() only when VLAN filtering is enabled. But when a skb is seen on the RX path as tagged with the bridge's pvid, and that bridge has vlan_filtering=0, and there isn't any 8021q upper with that VLAN either, then we have a problem. The bridge will not untag it (since it is supposed to remain VLAN-unaware), and pvid-tagged communication will be broken. There are 2 situations where we can end up like that: 1. When installing a pvid in egress-tagged mode, like this: ip link add dev br0 type bridge vlan_filtering 0 ip link set swp0 master br0 bridge vlan del dev swp0 vid 1 bridge vlan add dev swp0 vid 1 pvid This happens because DSA configures the VLAN membership of the CPU port using the same flags as swp0 (in this case "pvid and not untagged"), in an attempt to copy the frame as-is from ingress to the CPU. However, in this case, the packet may arrive untagged on ingress, it will be pvid-tagged by the ingress port, and will be sent as egress-tagged towards the CPU. Otherwise stated, the CPU will see a VLAN tag where there was none to speak of on ingress. When vlan_filtering is 1, this is not a problem, as stated in the first paragraph, because __allowed_ingress() will pop it. But currently, when vlan_filtering is 0 and we have such a VLAN configuration, we need an 8021q upper (br0.1) to be able to ping over that VLAN, which is not symmetrical with the vlan_filtering=1 case, and therefore, confusing for users. Basically what DSA attempts to do is simply an approximation: try to copy the skb with (or without) the same VLAN all the way up to the CPU. But DSA drivers treat CPU port VLAN membership in various ways (which is a good segue into situation 2). And some of those drivers simply tell the CPU port to copy the frame unmodified, which is the golden standard when it comes to VLAN processing (therefore, any driver which can configure the hardware to do that, should do that, and discard the VLAN flags requested by DSA on the CPU port). 2. Some DSA drivers always configure the CPU port as egress-tagged, in an attempt to recover the classified VLAN from the skb. These drivers cannot work at all with untagged traffic when bridged in vlan_filtering=0 mode. And they can't go for the easy "just keep the pvid as egress-untagged towards the CPU" route, because each front port can have its own pvid, and that might require conflicting VLAN membership settings on the CPU port (swp1 is pvid for VID 1 and egress-tagged for VID 2; swp2 is egress-taggeed for VID 1 and pvid for VID 2; with this simplistic approach, the CPU port, which is really a separate hardware entity and has its own VLAN membership settings, would end up being egress-untagged in both VID 1 and VID 2, therefore losing the VLAN tags of ingress traffic). So the only thing we can do is to create a helper function for resolving the problematic case (that is, a function which untags the bridge pvid when that is in vlan_filtering=0 mode), which taggers in need should call. It isn't called from the generic DSA receive path because there are drivers that fall neither in the first nor second category. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 18:13:45 -07:00
Tian Tao	ea6754aef2	net: switchdev: Fixed kerneldoc warning Update kernel-doc line comments to fix warnings reported by make W=1. net/switchdev/switchdev.c:413: warning: Function parameter or member 'extack' not described in 'call_switchdev_notifiers' Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Acked-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 17:46:31 -07:00
Mauro Carvalho Chehab	de2b541b3b	net: fix a new kernel-doc warning at dev.c kernel-doc expects the function prototype to be just after the kernel-doc markup, as otherwise it will get it all wrong: ./net/core/dev.c:10036: warning: Excess function parameter 'dev' description in 'WAIT_REFS_MIN_MSECS' Fixes: `0e4be9e57e` ("net: use exponential backoff in netdev_wait_allrefs") Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Reviewed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 17:42:54 -07:00
Mat Martineau	ef59b1953c	mptcp: Wake up MPTCP worker when DATA_FIN found on a TCP FIN packet When receiving a DATA_FIN MPTCP option on a TCP FIN packet, the DATA_FIN information would be stored but the MPTCP worker did not get scheduled. In turn, the MPTCP socket state would remain in TCP_ESTABLISHED and no blocked operations would be awakened. TCP FIN packets are seen by the MPTCP socket when moving skbs out of the subflow receive queues, so schedule the MPTCP worker when a skb with DATA_FIN but no data payload is moved from a subflow queue. Other cases (DATA_FIN on a bare TCP ACK or on a packet with data payload) are already handled. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/84 Fixes: `43b54c6ee3` ("mptcp: Use full MPTCP-level disconnect state machine") Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 17:30:52 -07:00
Nikolay Aleksandrov	36cfec7359	net: bridge: mcast: when forwarding handle filter mode and blocked flag We need to avoid forwarding to ports in MCAST_INCLUDE filter mode when the mdst entry is a *,G or when the port has the blocked flag. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:35 -07:00
Nikolay Aleksandrov	094b82fd53	net: bridge: mcast: handle host state Since host joins are considered as EXCLUDE {} joins we need to reflect that in all of *,G ports' S,G entries. Since the S,Gs can have host_joined == true only set automatically we can safely set it to false when removing all automatically added entries upon S,G delete. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	9116ffbf1d	net: bridge: mcast: add support for blocked port groups When excluding S,G entries we need a way to block a particular S,G,port. The new port group flag is managed based on the source's timer as per RFCs 3376 and 3810. When a source expires and its port group is in EXCLUDE mode, it will be blocked. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	8266a0491e	net: bridge: mcast: handle port group filter modes We need to handle group filter mode transitions and initial state. To change a port group's INCLUDE -> EXCLUDE mode (or when we have added a new port group in EXCLUDE mode) we need to add that port to all of ,G ports' S,G entries for proper replication. When the EXCLUDE state is changed from IGMPv3 report, br_multicast_fwd_filter_exclude() must be called after the source list processing because the assumption is that all of the group's S,G entries will be created before transitioning to EXCLUDE mode, i.e. most importantly its blocked entries will already be added so it will not get automatically added to them. The transition EXCLUDE -> INCLUDE happens only when a port group timer expires, it requires us to remove that port from all of ,G ports' S,G entries where it was automatically added previously. Finally when we are adding a new S,G entry we must add all of ,G's EXCLUDE ports to it. In order to distinguish automatically added ,G EXCLUDE ports we have a new port group flag - MDB_PG_FLAGS_STAR_EXCL. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	b08123684b	net: bridge: mcast: install S,G entries automatically based on reports This patch adds support for automatic install of S,G mdb entries based on the port group's source list and the source entry's timer. Once installed the S,G will be used when forwarding packets if the approprate multicast/mld versions are set. A new source flag called BR_SGRP_F_INSTALLED denotes if the source has a forwarding mdb entry installed. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	085b53c8be	net: bridge: mcast: add sg_port rhashtable To speedup S,G forward handling we need to be able to quickly find out if a port is a member of an S,G group. To do that add a global S,G port rhashtable with key: source addr, group addr, protocol, vid (all br_ip fields) and port pointer. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	8f8cb77e0b	net: bridge: mcast: add rt_protocol field to the port group struct We need to be able to differentiate between pg entries created by user-space and the kernel when we start generating S,G entries for IGMPv3/MLDv2's fast path. User-space entries are created by default as RTPROT_STATIC and the kernel entries are RTPROT_KERNEL. Later we can allow user-space to provide the entry rt_protocol so we can differentiate between who added the entries specifically (e.g. clag, admin, frr etc). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	7d07a68c25	net: bridge: mcast: when igmpv3/mldv2 are enabled lookup (S,G) first, then (,G) If (S,G) entries are enabled (igmpv3/mldv2) then look them up first. If there isn't a present (S,G) entry then try to find (,G). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	88d4bd1804	net: bridge: mdb: add support for add/del/dump of entries with source Add new mdb attributes (MDBE_ATTR_SOURCE for setting, MDBA_MDB_EATTR_SOURCE for dumping) to allow add/del and dump of mdb entries with a source address (S,G). New S,G entries are created with filter mode of MCAST_INCLUDE. The same attributes are used for IPv4 and IPv6, they're validated and parsed based on their protocol. S,G host joined entries which are added by user are not allowed yet. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	9c4258c78a	net: bridge: mdb: add support to extend add/del commands Since the MDB add/del code expects an exact struct br_mdb_entry we can't really add any extensions, thus add a new nested attribute at the level of MDBA_SET_ENTRY called MDBA_SET_ENTRY_ATTRS which will be used to pass all new options via netlink attributes. This patch doesn't change anything functionally since the new attribute is not used yet, only parsed. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	eab3227b12	net: bridge: mcast: rename br_ip's u member to dst Since now we have src in br_ip, u no longer makes sense so rename it to dst. No functional changes. v2: fix build with CONFIG_BATMAN_ADV_MCAST CC: Marek Lindner <mareklindner@neomailbox.ch> CC: Simon Wunderlich <sw@simonwunderlich.de> CC: Antonio Quartulli <a@unstable.cc> CC: Sven Eckelmann <sven@narfation.org> CC: b.a.t.m.a.n@lists.open-mesh.org Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	deb965662d	net: bridge: mcast: use br_ip's src for src groups and querier address Now that we have src and dst in br_ip it is logical to use the src field for the cases where we need to work with a source address such as querier source address and group source address. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	83f7398ea5	net: bridge: mdb: use extack in br_mdb_add() and br_mdb_add_group() Pass and use extack all the way down to br_mdb_add_group(). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	7eea629d07	net: bridge: mdb: move all port and bridge checks to br_mdb_add To avoid doing duplicate device checks and searches (the same were done in br_mdb_add and __br_mdb_add) pass the already found port to __br_mdb_add and pull the bridge's netif_running and enabled multicast checks to br_mdb_add. This would also simplify the future extack errors. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	2ac95dfe25	net: bridge: mdb: use extack in br_mdb_parse() We can drop the pr_info() calls and just use extack to return a meaningful error to user-space when br_mdb_parse() fails. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
David S. Miller	6d772f328d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-09-23 The following pull-request contains BPF updates for your net-next tree. We've added 95 non-merge commits during the last 22 day(s) which contain a total of 124 files changed, 4211 insertions(+), 2040 deletions(-). The main changes are: 1) Full multi function support in libbpf, from Andrii. 2) Refactoring of function argument checks, from Lorenz. 3) Make bpf_tail_call compatible with functions (subprograms), from Maciej. 4) Program metadata support, from YiFei. 5) bpf iterator optimizations, from Yonghong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:11:11 -07:00
Parav Pandit	c49a94405b	devlink: Enhance policy to validate port type input value Use range checking facility of nla_policy to validate port type attribute input value is valid or not. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-22 17:38:42 -07:00
Parav Pandit	ba356c9098	devlink: Enhance policy to validate eswitch mode value Use range checking facility of nla_policy to validate eswitch mode input attribute value is valid or not. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-22 17:38:42 -07:00
David S. Miller	3ab0a7a0c3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Two minor conflicts: 1) net/ipv4/route.c, adding a new local variable while moving another local variable and removing it's initial assignment. 2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes. One pretty prints the port mode differently, whilst another changes the driver to try and obtain the port mode from the port node rather than the switch node. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-22 16:45:34 -07:00
Linus Torvalds	d3017135c4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: - fix failure to add bond interfaces to a bridge, the offload-handling code was too defensive there and recent refactoring unearthed that. Users complained (Ido) - fix unnecessarily reflecting ECN bits within TOS values / QoS marking in TCP ACK and reset packets (Wei) - fix a deadlock with bpf iterator. Hopefully we're in the clear on this front now... (Yonghong) - BPF fix for clobbering r2 in bpf_gen_ld_abs (Daniel) - fix AQL on mt76 devices with FW rate control and add a couple of AQL issues in mac80211 code (Felix) - fix authentication issue with mwifiex (Maximilian) - WiFi connectivity fix: revert IGTK support in ti/wlcore (Mauro) - fix exception handling for multipath routes via same device (David Ahern) - revert back to a BH spin lock flavor for nsid_lock: there are paths which do require the BH context protection (Taehee) - fix interrupt / queue / NAPI handling in the lantiq driver (Hauke) - fix ife module load deadlock (Cong) - make an adjustment to netlink reply message type for code added in this release (the sole change touching uAPI here) (Michal) - a number of fixes for small NXP and Microchip switches (Vladimir) [ Pull request acked by David: "you can expect more of this in the future as I try to delegate more things to Jakub" ] * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (167 commits) net: mscc: ocelot: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries net: dsa: seville: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries net: dsa: felix: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries inet_diag: validate INET_DIAG_REQ_PROTOCOL attribute net: bridge: br_vlan_get_pvid_rcu() should dereference the VLAN group under RCU net: Update MAINTAINERS for MediaTek switch driver net/mlx5e: mlx5e_fec_in_caps() returns a boolean net/mlx5e: kTLS, Avoid kzalloc(GFP_KERNEL) under spinlock net/mlx5e: kTLS, Fix leak on resync error flow net/mlx5e: kTLS, Add missing dma_unmap in RX resync net/mlx5e: kTLS, Fix napi sync and possible use-after-free net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported net/mlx5e: Fix using wrong stats_grps in mlx5e_update_ndo_stats() net/mlx5e: Fix multicast counter not up-to-date in "ip -s" net/mlx5e: Fix endianness when calculating pedit mask first bit net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported net/mlx5e: CT: Fix freeing ct_label mapping net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready net/mlx5e: Use synchronize_rcu to sync with NAPI net/mlx5e: Use RCU to protect rq->xdp_prog ...	2020-09-22 14:43:50 -07:00
Eric Dumazet	d5e4d0a5e6	inet_diag: validate INET_DIAG_REQ_PROTOCOL attribute User space could send an invalid INET_DIAG_REQ_PROTOCOL attribute as caught by syzbot. BUG: KMSAN: uninit-value in inet_diag_lock_handler net/ipv4/inet_diag.c:55 [inline] BUG: KMSAN: uninit-value in __inet_diag_dump+0x58c/0x720 net/ipv4/inet_diag.c:1147 CPU: 0 PID: 8505 Comm: syz-executor174 Not tainted 5.9.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x21c/0x280 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:122 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:219 inet_diag_lock_handler net/ipv4/inet_diag.c:55 [inline] __inet_diag_dump+0x58c/0x720 net/ipv4/inet_diag.c:1147 inet_diag_dump_compat+0x2a5/0x380 net/ipv4/inet_diag.c:1254 netlink_dump+0xb73/0x1cb0 net/netlink/af_netlink.c:2246 __netlink_dump_start+0xcf2/0xea0 net/netlink/af_netlink.c:2354 netlink_dump_start include/linux/netlink.h:246 [inline] inet_diag_rcv_msg_compat+0x5da/0x6c0 net/ipv4/inet_diag.c:1288 sock_diag_rcv_msg+0x24f/0x620 net/core/sock_diag.c:256 netlink_rcv_skb+0x6d7/0x7e0 net/netlink/af_netlink.c:2470 sock_diag_rcv+0x63/0x80 net/core/sock_diag.c:275 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x11c8/0x1490 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x173a/0x1840 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg net/socket.c:671 [inline] ____sys_sendmsg+0xc82/0x1240 net/socket.c:2353 ___sys_sendmsg net/socket.c:2407 [inline] __sys_sendmsg+0x6d1/0x820 net/socket.c:2440 __do_sys_sendmsg net/socket.c:2449 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x441389 Code: e8 fc ab 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fff3b02ce98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441389 RDX: 0000000000000000 RSI: 0000000020001500 RDI: 0000000000000003 RBP: 00000000006cb018 R08: 00000000004002c8 R09: 00000000004002c8 R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000402130 R13: 00000000004021c0 R14: 0000000000000000 R15: 0000000000000000 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:143 [inline] kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:126 kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:80 slab_alloc_node mm/slub.c:2907 [inline] __kmalloc_node_track_caller+0x9aa/0x12f0 mm/slub.c:4511 __kmalloc_reserve net/core/skbuff.c:142 [inline] __alloc_skb+0x35f/0xb30 net/core/skbuff.c:210 alloc_skb include/linux/skbuff.h:1094 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1176 [inline] netlink_sendmsg+0xdb9/0x1840 net/netlink/af_netlink.c:1894 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg net/socket.c:671 [inline] ____sys_sendmsg+0xc82/0x1240 net/socket.c:2353 ___sys_sendmsg net/socket.c:2407 [inline] __sys_sendmsg+0x6d1/0x820 net/socket.c:2440 __do_sys_sendmsg net/socket.c:2449 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: `3f935c75eb` ("inet_diag: support for wider protocol numbers") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Christoph Paasch <cpaasch@apple.com> Cc: Mat Martineau <mathew.j.martineau@linux.intel.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-21 17:38:51 -07:00
Vladimir Oltean	99f62a7460	net: bridge: br_vlan_get_pvid_rcu() should dereference the VLAN group under RCU When calling the RCU brother of br_vlan_get_pvid(), lockdep warns: ============================= WARNING: suspicious RCU usage 5.9.0-rc3-01631-g13c17acb8e38-dirty #814 Not tainted ----------------------------- net/bridge/br_private.h:1054 suspicious rcu_dereference_protected() usage! Call trace: lockdep_rcu_suspicious+0xd4/0xf8 __br_vlan_get_pvid+0xc0/0x100 br_vlan_get_pvid_rcu+0x78/0x108 The warning is because br_vlan_get_pvid_rcu() calls nbp_vlan_group() which calls rtnl_dereference() instead of rcu_dereference(). In turn, rtnl_dereference() calls rcu_dereference_protected() which assumes operation under an RCU write-side critical section, which obviously is not the case here. So, when the incorrect primitive is used to access the RCU-protected VLAN group pointer, READ_ONCE() is not used, which may cause various unexpected problems. I'm sad to say that br_vlan_get_pvid() and br_vlan_get_pvid_rcu() cannot share the same implementation. So fix the bug by splitting the 2 functions, and making br_vlan_get_pvid_rcu() retrieve the VLAN groups under proper locking annotations. Fixes: `7582f5b70f` ("bridge: add br_vlan_get_pvid_rcu()") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-21 17:37:44 -07:00
YueHaibing	18cd9b00ff	ipvs: Remove unused macros They are not used since commit `e4ff675130` ("ipvs: add sync_maxlen parameter for the sync daemon") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-09-22 01:36:20 +02:00
Florian Westphal	1702ad79d3	netfilter: conntrack: proc: rename stat column Rename 'searched' column to 'clashres' (same len). conntrack(8) using the old /proc interface (ctnetlink not available) shows: cpu=0 entries=4784 clashres=2292 [..] Another alternative is to add another column, but this increases the number of always-0 columns. Fixes: `bc92470413` ("netfilter: conntrack: add clash resolution stat counter") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-09-22 01:33:35 +02:00
Yonghong Song	c69d2ddb20	bpf: Using rcu_read_lock for bpf_sk_storage_map iterator If a bucket contains a lot of sockets, during bpf_iter traversing a bucket, concurrent userspace bpf_map_update_elem() and bpf program bpf_sk_storage_{get,delete}() may experience some undesirable delays as they will compete with bpf_iter for bucket lock. Note that the number of buckets for bpf_sk_storage_map is roughly the same as the number of cpus. So if there are lots of sockets in the system, each bucket could contain lots of sockets. Different actual use cases may experience different delays. Here, using selftest bpf_iter subtest bpf_sk_storage_map, I hacked the kernel with ktime_get_mono_fast_ns() to collect the time when a bucket was locked during bpf_iter prog traversing that bucket. This way, the maximum incurred delay was measured w.r.t. the number of elements in a bucket. # elems in each bucket delay(ns) 64 17000 256 72512 2048 875246 The potential delays will be further increased if we have even more elemnts in a bucket. Using rcu_read_lock() is a reasonable compromise here. It may lose some precision, e.g., access stale sockets, but it will not hurt performance of bpf program or user space application which also tries to get/delete or update map elements. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Cc: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200916224645.720172-1-yhs@fb.com	2020-09-21 15:31:53 -07:00
Lorenz Bauer	9436ef6e86	bpf: Allow specifying a BTF ID per argument in function protos Function prototypes using ARG_PTR_TO_BTF_ID currently use two ways to signal which BTF IDs are acceptable. First, bpf_func_proto.btf_id is an array of IDs, one for each argument. This array is only accessed up to the highest numbered argument that uses ARG_PTR_TO_BTF_ID and may therefore be less than five arguments long. It usually points at a BTF_ID_LIST. Second, check_btf_id is a function pointer that is called by the verifier if present. It gets the actual BTF ID of the register, and the argument number we're currently checking. It turns out that the only user check_arg_btf_id ignores the argument, and is simply used to check whether the BTF ID has a struct sock_common at it's start. Replace both of these mechanisms with an explicit BTF ID for each argument in a function proto. Thanks to btf_struct_ids_match this is very flexible: check_arg_btf_id can be replaced by requiring struct sock_common. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200921121227.255763-5-lmb@cloudflare.com	2020-09-21 15:00:40 -07:00
David S. Miller	c5a2a132a3	linux-can-next-for-5.10-20200921 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAl9onIcTHG1rbEBwZW5n dXRyb25peC5kZQAKCRCpyVqK+u3vqf2aCACpwHz9PLKTX4zNDnDRsA7x4c+Gk2nU FOgwodyk9ayax1lLYkgq3ynlb5zM4tRNKLJguoJX69pVvw87uPwBhEbQWQkFFZoW ErkAmYgzGeoA2bu/iJnbsktm7uyEv5a5SntZ+3DSfdnJhuZoU2WfgGa0Z6yDYlWG W6/Dy66XGKzCrK1iFOyvvaNDOhS+Jkul1JWcyhrztQeCbIPTiqVztaE/QA+5cDoT ScO4LdHx7U49XQMiNUjrHFv8qso7fVPicqnefWb19w22wixPzUSGNjCOk284qLn9 eldxYwN77sC8VzPIiyKv7sW8z1o+lq1uTeO5Aa+ATqhCdwTmeNIVfJsp =C8ET -----END PGP SIGNATURE----- Merge tag 'linux-can-next-for-5.10-20200921' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2020-09-21 this is a pull request of 38 patches for net-next. the first 5 patches are by Colin Ian King, Alexandre Belloni and me and they fix various spelling mistakes. The next patch is by me and fixes the indention in the CAN raw protocol according to the kernel coding style. Diego Elio Pettenò contributes two patches to fix dead links in CAN's Kconfig. Masahiro Yamada's patch removes the "WITH Linux-syscall-note" from SPDX tag of C files. AThe next 4 patches are by me and target the CAN device infrastructure and add error propagation and improve the output of various messages to ease driver development and debugging. YueHaibing's patch for the c_can driver removes an unused inline function. Next follows another patch by Colin Ian King, which removes the unneeded initialization of a variable in the mcba_usb driver. A patch by me annotates a fallthrough in the mscan driver. The ti_hecc driver is converted to use devm_platform_ioremap_resource_byname() in a patch by Dejin Zheng. Liu Shixin's patch converts the pcan_usb_pro driver to make use of le32_add_cpu() instead of open coding it. Wang Hai's patch for the peak_pciefd_main driver removes an unused makro. Vaibhav Gupta's patch converts the pch_can driver to generic power management. Stephane Grosjean improves the pcan_usb usb driver by first documenting the commands sent to the device and by adding support of rxerr/txerr counters. The next patch is by me and cleans up the Kconfig of the CAN SPI drivers. The next 6 patches all target the mcp251x driver, they are by Timo Schlüßler, Andy Shevchenko, Tim Harvey and me. They update the DT bindings documentation, sort the include files alphabetically, add GPIO support, make use of the readx_poll_timeout() helper, and add support for half duplex SPI-controllers. Wolfram Sang contributes a patch to update the contact email address in the mscan driver, while Zhang Changzhong updates the clock handling. The next patch is by and updates the rx-offload infrastructure to support callback less usage. The last 6 patches add support for the mcp25xxfd CAN SPI driver. First the dt-bindings are added by Oleksij Rempel, the regmap infrastructure and the main driver is contributed by me. Kurt Van Dijck adds listen-only support, Manivannan Sadhasivam adds himself as maintainer, and Thomas Kopp himself as a reviewer. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-21 14:57:05 -07:00
David S. Miller	ae4dd9a8c2	This time we have: * some AP-side infrastructure for FILS discovery and unsolicited probe resonses * a major rework of the encapsulation/header conversion offload from Felix, to fit better with e.g. AP_VLAN interfaces * performance fix for VHT A-MPDU size, don't limit to HT * some initial patches for S1G (sub 1 GHz) support, more will come in this area * minor cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl9oqc0ACgkQB8qZga/f l8TdMxAAnGTS7p6n//5RrdNmbT/UgMPBPhdHJTXZy7q1z3hDI5xrWRrx26fQeB31 exslnHd6q28bNGKscExcLRsP63SKn+P3PdUWJzbusya46OZnMgajNQYiQkL3ISfS MkF0Uyv7GmkkRn8nBg1ZbLMtzONxqn9uT6o3qyNODHwYgf/QjlEWstL7Q0fEy5UU gftBz4WOhMOsEQQu3XvLu2hMd6EI14ZWkChdwboYJ29GgJrCYnhO/0B4QpVJxBpp ROYQgF3c3Fis2tJpyZ0FNqG7T16MhjOD2+hyNAcX2+ZkkbSzn6B89jOd/qeowJOT FwYbJNQTUxBYH8+IYZeGVWRMyPuZdazrfccbTd9Lfrk3y/Hi+cuuneFFXykST6Zd EqRkumAOJY9b6HgcGTkfRFdY4SU4v3lcAOcAgg1fJVFvmLKcHfhD3xKyGz3Q1JLv v9x/9S+G69YWyxl09rMY9c78S6enxKmhJdV1wnv3zbALvqNTa2OpaZnCBx+hmkEy NoSowwYeb0PANk9PfA1N0+uQPVKu8SGT0FhxLj6NEAhCbbVBTqZ3z4TkdYByy/a0 L80zk1ziyC4uS24+TcILHyCzRHkdLdsD8lEeVTDZIusU6nXX9imZ+yWGwIkbi35Y v+6fUns/1UMJqBc3J/wtr9pamoIBJT3Qe0lHC4Uy7CxuO5S1UVM= =Bnsm -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-net-next-2020-09-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== This time we have: * some AP-side infrastructure for FILS discovery and unsolicited probe resonses * a major rework of the encapsulation/header conversion offload from Felix, to fit better with e.g. AP_VLAN interfaces * performance fix for VHT A-MPDU size, don't limit to HT * some initial patches for S1G (sub 1 GHz) support, more will come in this area * minor cleanups ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-21 14:55:09 -07:00
David S. Miller	25b8ab916d	Just a few fixes: * fix using HE on 2.4 GHz * AQL (airtime queue limit) estimation & VHT160 fix * do not oversize A-MPDUs if local capability is smaller than peer's * fix radiotap on 6 GHz to not put 2.4 GHz flag * fix Kconfig for lib80211 * little fixlet for 6 GHz channel number / frequency conversion -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl9oqO8ACgkQB8qZga/f l8TKTRAAg3M+tx6FSz5zaxb5JuILSkxLvMkFeYgWpVjVaDRKN4/l3XSI68dswZtU NjNiA8RQ0+lOLOyvmUD38pvEPUQuD+3EYl7f7vsRgNV2ylHqdII6xCaqWYXWdu5h EVtA58SSbqw/RHyCGA1IQ5lO57fCSp1d4+0tYF0FzyfV2Qg8iBaYiGgeSc9xz28Q Cw0l5jqEyk0h73ptm/zznESJKFokrs0cQZ0R2+IbrtJmTXTMwSlKWJt00toT42ZL TXFfMQvtnZyL0HuWxilYBOFzp9Np4i7CGIyDechMlPej5+euzUrg6yGS42uMaR4c qtgynL5KP6dbS/tOnCH4FmFgA25mIWzOOkGCuOxJjVzxE/7bqqfi6hGFxdhtmnoW 6889XSX3ueQMiOPV+8fATMz5oi5N6OHrnkyHirq7RiwVv7DodRLcNP4rcbnaFcIa j2OQ+uAmmOb0x3n/t1L2aozqPssS/xKGPHcAP1su5lkeYzz6leiTlqNxGxIA16C0 QmByD8qf19R3OxPAlnTRRMQlMiCs4yGqjsBRH4u053FsNnSmPMw+si++zDKu7pP4 q9zv4a+FY/MZyUNx4k+XFynk18nF8Xx1aNLADCe+AtN+smhUeITapkvcHN4iae+1 89XnqVJaEGvnFc2F16K1cfVEmERQR9zY/ufmtFthu7S+xgxosr8= =ZjBk -----END PGP SIGNATURE----- Merge tag 'mac80211-for-net-2020-09-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Just a few fixes: * fix using HE on 2.4 GHz * AQL (airtime queue limit) estimation & VHT160 fix * do not oversize A-MPDUs if local capability is smaller than peer's * fix radiotap on 6 GHz to not put 2.4 GHz flag * fix Kconfig for lib80211 * little fixlet for 6 GHz channel number / frequency conversion ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-21 14:54:35 -07:00
Xu Wang	91b2c9a0fd	ipv6: route: convert comma to semicolon Replace a comma between expression statements by a semicolon. Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-21 14:52:18 -07:00
Jing Xiangfeng	c8c33b80f4	net: unix: remove redundant assignment to variable 'err' After commit `37ab4fa784` ("net: unix: allow bind to fail on mutex lock"), the assignment to err is redundant. So remove it. Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-21 14:51:37 -07:00
Paolo Abeni	7d58e65558	net-sysfs: add backlog len and CPU id to softnet data Currently the backlog status in not exposed to user-space. Since long backlogs (or simply not empty ones) can be a source of noticeable latency, -RT deployments need some way to inspect it. Additionally, there isn't a direct match between 'softnet_stat' lines and the related CPU - sd for offline CPUs are not dumped - so this patch also includes the CPU id into such entry. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-21 13:56:37 -07:00
Julia Lawall	ed38c33f1c	xprtrdma: drop double zeroing sg_init_table zeroes its first argument, so the allocation of that argument doesn't have to. the semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x,n,flags; @@ x = - kcalloc + kmalloc_array (n,sizeof(*x),flags) ... sg_init_table(x,n) // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 12:15:25 -04:00
Alexander A. Klimov	0bdd4cea12	Replace HTTP links with HTTPS ones: NFS, SUNRPC, and LOCKD clients Rationale: Reduces attack surface on kernel devs opening the links for MITM as HTTPS traffic is much harder to manipulate. Deterministic algorithm: For each file: If not .svg: For each line: If doesn't contain `\bxmlns\b`: For each link, `\bhttp://[^# \t\r\n]*(?:\w\|/)`: If both the HTTP and HTTPS versions return 200 OK and serve the same content: Replace HTTP with HTTPS. Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:10 -04:00
Chuck Lever	5589cc4778	SUNRPC: Remove remaining dprintks from sched.c Clean up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:09 -04:00
Chuck Lever	721a1d388b	SUNRPC: Remove dprintk call sites in RPC queuing functions Remove redundant call sites or call sites that are already covered by tracepoints. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:09 -04:00
Chuck Lever	1466c22163	SUNRPC: Clean up RPC scheduler tracepoints Remove several redundant dprintk call sites, and replace a couple of potentially useful ones with tracepoints. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:09 -04:00
Chuck Lever	c3adcc7dfb	SUNRPC: Replace rpcbind dprintk call sites with tracepoints In many cases, tracepoints already report these errors. In others, the dprintks were mainly useful when this code was less mature. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:09 -04:00
Chuck Lever	1e664987a9	SUNRPC: Remove more dprintks in rpcb_clnt.c Clean up: These are superfluous now that rpc_create() and friends have tracepoints to report errors. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:09 -04:00
Chuck Lever	15a798d6ce	SUNRPC: Remove dprintk call sites in rpcbind XDR functions Clean up: Other XDR functions no longer have dprintk call sites. These were added during development and can be removed now that the code is mature. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:09 -04:00
Chuck Lever	ac1ae53421	SUNRPC: Hoist trace_xprtrdma_op_setport into generic code Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:09 -04:00
Chuck Lever	e465cc3fa8	SUNRPC: Remove rpcb_getport_async dprintk call sites In many cases, tracepoints already report these errors. In others, the dprintks were mainly useful when this code was less mature. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:09 -04:00
Chuck Lever	42ebfc2cbf	SUNRPC: Clean up call_bind_status() observability Time to remove dprintk call sites in here. Regarding the rpc_bind_status tracepoint: It's friendlier to administrators if they don't have to look up the error code to figure out what went wrong. Replace trace_rpc_bind_status with a set of tracepoints that report more specifically what the problem was, and what RPC program/version was being queried. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:09 -04:00
Chuck Lever	fd66e2a79d	SUNRPC: Remove dprintk call site in call_decode Clean up. When enabled, this dprintk adds a line in /var/log/messages after every RPC that reports the task ID (no connection to on the wire XID values) and the RPC's result (no connection to the program, operation, or the arguments and results). Thus it's value is pretty low. Let's remove it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:09 -04:00
Chuck Lever	7c8099f6ad	SUNRPC: Trace call_refresh events Clean up: Replace dprintk call sites. Note that rpc_call_rpcerror() already has a trace point, so perhaps adding trace_rpc_refresh_status() isn't necessary. However, it does report a particular category of error. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:09 -04:00
Chuck Lever	914cdcc78a	SUNRPC: Add trace_rpc_timeout_status() For a long while we've wanted a tracepoint that fires when a major timeout is reported in the system log. Such a tracepoint can be attached to other actions that can take place when a timeout is detected (eg, server or connection health assessment). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:09 -04:00
Chuck Lever	6f9f17287e	SUNRPC: Mitigate cond_resched() in xprt_transmit() The original purpose of this expensive call is to prevent a long queue of requests from blocking other work. The cond_resched() call is unnecessary after just a single send operation. For longer queues, instead of invoking the kernel scheduler, simply release the transport send lock and return to the RPC scheduler. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:09 -04:00
Chuck Lever	db0a86c426	SUNRPC: Replace connect dprintk call sites with a tracepoint This trace event can be used to audit transport connections from the client. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:09 -04:00
Chuck Lever	0ec36cc9cd	SUNRPC: Remove dprintk call site in call_start() Clean up: The rpc_rpc_request tracepoint serves the same purpose. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:09 -04:00
Chuck Lever	6387039d6d	SUNRPC: Remove the dprint_status() macro Clean up: The rpc_task_run_action tracepoint serves the same purpose. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:09 -04:00
Chuck Lever	015747d296	SUNRPC: Replace dprintk() call site in xs_nospace() "no socket space" is an exceptional and infrequent condition that troubleshooters want to know about. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:08 -04:00
Chuck Lever	9ce07ae5eb	SUNRPC: Replace dprintk() call site in xprt_prepare_transmit Generate a trace event when an RPC request is queued without being sent immediately. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:08 -04:00
Chuck Lever	09d2ba0cb1	SUNRPC: Update debugging instrumentation in xprt_do_reserve() Replace a dprintk() with a tracepoint. The tracepoint marks the point where an RPC request is assigned an XID. Additional clean up: Remove trace_xprt_enq_xmit, which reports much the same thing. That tracepoint was added for debugging commit `918f3c1fe8` ("SUNRPC: Improve latency for interactive tasks"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:08 -04:00
Chuck Lever	7806948753	SUNRPC: Remove debugging instrumentation from xprt_release These instruments don't appear to add any substantial value. We already have this at the termination of each RPC: iozone-2617 [002] 975.713126: rpc_stats_latency: task:418@5 xid=0x260eab5d nfsv3 LOOKUP backlog=15 rtt=32 execute=58 iozone-2617 [002] 975.713127: xprt_release_cong: task:418@5 snd_task:4294967295 cong=256 cwnd=16384 iozone-2617 [002] 975.713127: xprt_put_cong: task:418@5 snd_task:4294967295 cong=0 cwnd=16384 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:08 -04:00
Chuck Lever	06e234c613	SUNRPC: Hoist trace_xprtrdma_op_allocate into generic code Introduce a tracepoint in call_allocate that reports the exact sizes in the RPC buffer allocation request and the status of the result. This helps catch problems with XDR buffer provisioning, and replaces transport-specific debugging instrumentation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:08 -04:00
Chuck Lever	e4378a0fdd	SUNRPC: Remove trace_xprt_complete_rqst() Request completion is already recorded by an "rpc_task_wakeup queue=xprt_pending" trace record. A subsequent rpc_xdr_recvfrom trace record shows the number of bytes received. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:08 -04:00
Olga Kornievskaia	88428cc4ae	SUNRPC dont update timeout value on connection reset Current behaviour: every time a v3 operation is re-sent to the server we update (double) the timeout. There is no distinction between whether or not the previous timer had expired before the re-sent happened. Here's the scenario: 1. Client sends a v3 operation 2. Server RST-s the connection (prior to the timeout) (eg., connection is immediately reset) 3. Client re-sends a v3 operation but the timeout is now 120sec. As a result, an application sees 2mins pause before a retry in case server again does not reply. Where as if a connection reset didn't change the timeout value, the client would have re-tried (the 3rd time) after 60secs. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 10:21:08 -04:00

... 3 4 5 6 7 ...

62709 Commits