linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Jiri Pirko	3a5e523479	devlink: remove pointless data_len arg from region snapshot create The size of the snapshot has to be the same as the size of the region, therefore no need to pass it again during snapshot creation. Remove the arg and use region->size instead. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-09 11:21:46 -07:00
Eric Dumazet	1a9914884d	tcp: batch calls to sk_flush_backlog() Starting from commit `d41a69f1d3` ("tcp: make tcp_sendmsg() aware of socket backlog") loopback flows got hurt, because for each skb sent, the socket receives an immediate ACK and sk_flush_backlog() causes extra work. Intent was to not let the backlog grow too much, but we went a bit too far. We can check the backlog every 16 skbs (about 1MB chunks) to increase TCP over loopback performance by about 15 % Note that the call to sk_flush_backlog() handles a single ACK, thanks to coalescing done on backlog, but cleans the 16 skbs found in rtx rb-tree. Reported-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-09 11:03:27 -07:00
Ivan Khoronzhuk	d9973cec9d	xdp: xdp_umem: fix umem pages mapping for 32bits systems Use kmap instead of page_address as it's not always in low memory. Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-09 18:02:19 +02:00
David Howells	e8c3af6bb3	rxrpc: Don't bother generating maxSkew in the ACK packet Don't bother generating maxSkew in the ACK packet as it has been obsolete since AFS 3.1. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com>	2019-08-09 15:24:00 +01:00
David Howells	730c5fd42c	rxrpc: Fix local endpoint refcounting The object lifetime management on the rxrpc_local struct is broken in that the rxrpc_local_processor() function is expected to clean up and remove an object - but it may get requeued by packets coming in on the backing UDP socket once it starts running. This may result in the assertion in rxrpc_local_rcu() firing because the memory has been scheduled for RCU destruction whilst still queued: rxrpc: Assertion failed ------------[ cut here ]------------ kernel BUG at net/rxrpc/local_object.c:468! Note that if the processor comes around before the RCU free function, it will just do nothing because ->dead is true. Fix this by adding a separate refcount to count active users of the endpoint that causes the endpoint to be destroyed when it reaches 0. The original refcount can then be used to refcount objects through the work processor and cause the memory to be rcu freed when that reaches 0. Fixes: `4f95dd78a7` ("rxrpc: Rework local endpoint management") Reported-by: syzbot+1e0edc4b8b7494c28450@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com>	2019-08-09 15:21:19 +01:00
Pablo Neira Ayuso	1e5b2471bc	netfilter: nf_flow_table: teardown flow timeout race Flows that are in teardown state (due to RST / FIN TCP packet) still have their offload flag set on. Hence, the conntrack garbage collector may race to undo the timeout adjustment that the fixup routine performs, leaving the conntrack entry in place with the internal offload timeout (one day). Update teardown flow state to ESTABLISHED and set tracking to liberal, then once the offload bit is cleared, adjust timeout if it is more than the default fixup timeout (conntrack might already have set a lower timeout from the packet path). Fixes: `da5984e510` ("netfilter: nf_flow_table: add support for sending flows back to the slow path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-09 14:41:21 +02:00
Pablo Neira Ayuso	3e68db2f64	netfilter: nf_flow_table: conntrack picks up expired flows Update conntrack entry to pick up expired flows, otherwise the conntrack entry gets stuck with the internal offload timeout (one day). The TCP state also needs to be adjusted to ESTABLISHED state and tracking is set to liberal mode in order to give conntrack a chance to pick up the expired flow. Fixes: `ac2a66665e` ("netfilter: add generic flow table infrastructure") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-09 14:41:20 +02:00
Pablo Neira Ayuso	6a0a8d10a3	netfilter: nf_tables: use-after-free in failing rule with bound set If a rule that has already a bound anonymous set fails to be added, the preparation phase releases the rule and the bound set. However, the transaction object from the abort path still has a reference to the set object that is stale, leading to a use-after-free when checking for the set->bound field. Add a new field to the transaction that specifies if the set is bound, so the abort path can skip releasing it since the rule command owns it and it takes care of releasing it. After this update, the set->bound field is removed. [ 24.649883] Unable to handle kernel paging request at virtual address 0000000000040434 [ 24.657858] Mem abort info: [ 24.660686] ESR = 0x96000004 [ 24.663769] Exception class = DABT (current EL), IL = 32 bits [ 24.669725] SET = 0, FnV = 0 [ 24.672804] EA = 0, S1PTW = 0 [ 24.675975] Data abort info: [ 24.678880] ISV = 0, ISS = 0x00000004 [ 24.682743] CM = 0, WnR = 0 [ 24.685723] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000428952000 [ 24.692207] [0000000000040434] pgd=0000000000000000 [ 24.697119] Internal error: Oops: 96000004 [#1] SMP [...] [ 24.889414] Call trace: [ 24.891870] __nf_tables_abort+0x3f0/0x7a0 [ 24.895984] nf_tables_abort+0x20/0x40 [ 24.899750] nfnetlink_rcv_batch+0x17c/0x588 [ 24.904037] nfnetlink_rcv+0x13c/0x190 [ 24.907803] netlink_unicast+0x18c/0x208 [ 24.911742] netlink_sendmsg+0x1b0/0x350 [ 24.915682] sock_sendmsg+0x4c/0x68 [ 24.919185] ___sys_sendmsg+0x288/0x2c8 [ 24.923037] __sys_sendmsg+0x7c/0xd0 [ 24.926628] __arm64_sys_sendmsg+0x2c/0x38 [ 24.930744] el0_svc_common.constprop.0+0x94/0x158 [ 24.935556] el0_svc_handler+0x34/0x90 [ 24.939322] el0_svc+0x8/0xc [ 24.942216] Code: 37280300 f9404023 91014262 aa1703e0 (f9401863) [ 24.948336] ---[ end trace cebbb9dcbed3b56f ]--- Fixes: `f6ac858589` ("netfilter: nf_tables: unbind set in rule from commit path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-09 14:41:13 +02:00
Jakub Kicinski	414776621d	net/tls: prevent skb_orphan() from leaking TLS plain text with offload sk_validate_xmit_skb() and drivers depend on the sk member of struct sk_buff to identify segments requiring encryption. Any operation which removes or does not preserve the original TLS socket such as skb_orphan() or skb_clone() will cause clear text leaks. Make the TCP socket underlying an offloaded TLS connection mark all skbs as decrypted, if TLS TX is in offload mode. Then in sk_validate_xmit_skb() catch skbs which have no socket (or a socket with no validation) and decrypted flag set. Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and sk->sk_validate_xmit_skb are slightly interchangeable right now, they all imply TLS offload. The new checks are guarded by CONFIG_TLS_DEVICE because that's the option guarding the sk_buff->decrypted member. Second, smaller issue with orphaning is that it breaks the guarantee that packets will be delivered to device queues in-order. All TLS offload drivers depend on that scheduling property. This means skb_orphan_partial()'s trick of preserving partial socket references will cause issues in the drivers. We need a full orphan, and as a result netem delay/throttling will cause all TLS offload skbs to be dropped. Reusing the sk_buff->decrypted flag also protects from leaking clear text when incoming, decrypted skb is redirected (e.g. by TC). See commit `0608c69c9a` ("bpf: sk_msg, sock{map\|hash} redirect through ULP") for justification why the internal flag is safe. The only location which could leak the flag in is tcp_bpf_sendmsg(), which is taken care of by clearing the previously unused bit. v2: - remove superfluous decrypted mark copy (Willem); - remove the stale doc entry (Boris); - rely entirely on EOR marking to prevent coalescing (Boris); - use an internal sendpages flag instead of marking the socket (Boris). v3 (Willem): - reorganize the can_skb_orphan_partial() condition; - fix the flag leak-in through tcp_bpf_sendmsg. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:39:35 -07:00
Roman Mashak	e1fea322fc	net sched: update skbedit action for batched events operations Add get_fill_size() routine used to calculate the action size when building a batch of events. Fixes: `ca9b0e27e` ("pkt_action: add new action skbedit") Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:37:06 -07:00
YueHaibing	018e5b4587	fq_codel: remove set but not used variables 'prev_ecn_mark' and 'prev_drop_count' Fixes gcc '-Wunused-but-set-variable' warning: net/sched/sch_fq_codel.c: In function fq_codel_dequeue: net/sched/sch_fq_codel.c:288:23: warning: variable prev_ecn_mark set but not used [-Wunused-but-set-variable] net/sched/sch_fq_codel.c:288:6: warning: variable prev_drop_count set but not used [-Wunused-but-set-variable] They are not used since commit `77ddaff218` ("fq_codel: Kill useless per-flow dropped statistic") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:32:19 -07:00
John Rutherford	6c9081a391	tipc: add loopback device tracking Since node internal messages are passed directly to the socket, it is not possible to observe those messages via tcpdump or wireshark. We now remedy this by making it possible to clone such messages and send the clones to the loopback interface. The clones are dropped at reception and have no functional role except making the traffic visible. The feature is enabled if network taps are active for the loopback device. pcap filtering restrictions require the messages to be presented to the receiving side of the loopback device. v3 - Function dev_nit_active used to check for network taps. - Procedure netif_rx_ni used to send cloned messages to loopback device. Signed-off-by: John Rutherford <john.rutherford@dektech.com.au> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:11:39 -07:00
Ivan Khoronzhuk	51650d33b2	net: sched: sch_taprio: fix memleak in error path for sched list parse In error case, all entries should be freed from the sched list before deleting it. For simplicity use rcu way. Fixes: `5a781ccbd1` ("tc: Add support for configuring the taprio scheduler") Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:00:24 -07:00
wenxu	9a32669fec	netfilter: nf_tables_offload: support indr block call nftable support indr-block call. It makes nftable an offload vlan and tunnel device. nft add table netdev firewall nft add chain netdev firewall aclout { type filter hook ingress offload device mlx_pf0vf0 priority - 300 \; } nft add rule netdev firewall aclout ip daddr 10.0.0.1 fwd to vlan0 nft add chain netdev firewall aclin { type filter hook ingress device vlan0 priority - 300 \; } nft add rule netdev firewall aclin ip daddr 10.0.0.7 fwd to mlx_pf0vf0 Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 18:44:30 -07:00
wenxu	1150ab0f1b	flow_offload: support get multi-subsystem block It provide a callback list to find the blocks of tc and nft subsystems Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 18:44:30 -07:00
wenxu	4e481908c5	flow_offload: move tc indirect block to flow offload move tc indirect block to flow_offload and rename it to flow indirect block.The nf_tables can use the indr block architecture. Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 18:44:30 -07:00
wenxu	e4da910211	cls_api: add flow_indr_block_call function This patch make indr_block_call don't access struct tc_indr_block_cb and tc_indr_block_dev directly Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 18:44:30 -07:00
wenxu	f843698857	cls_api: remove the tcf_block cache Remove the tcf_block in the tc_indr_block_dev for muti-subsystem support. Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 18:44:30 -07:00
wenxu	242453c227	cls_api: modify the tc_indr_block_ing_cmd parameters. This patch make tc_indr_block_ing_cmd can't access struct tc_indr_block_dev and tc_indr_block_cb. Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 18:44:30 -07:00
Edward Cree	323ebb61e3	net: use listified RX for handling GRO_NORMAL skbs When GRO decides not to coalesce a packet, in napi_frags_finish(), instead of passing it to the stack immediately, place it on a list in the napi struct. Then, at flush time (napi_complete_done(), napi_poll(), or napi_busy_loop()), call netif_receive_skb_list_internal() on the list. We'd like to do that in napi_gro_flush(), but it's not called if !napi->gro_bitmask, so we have to do it in the callers instead. (There are a handful of drivers that call napi_gro_flush() themselves, but it's not clear why, or whether this will affect them.) Because a full 64 packets is an inefficiently large batch, also consume the list whenever it exceeds gro_normal_batch, a new net/core sysctl that defaults to 8. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 18:22:29 -07:00
Alexey Dobriyan	9d2f112383	net: delete "register" keyword Delete long obsoleted "register" keyword. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 18:03:42 -07:00
Guillaume Nault	891584f48a	inet: frags: re-introduce skb coalescing for local delivery Before commit `d4289fcc9b` ("net: IP6 defrag: use rbtrees for IPv6 defrag"), a netperf UDP_STREAM test[0] using big IPv6 datagrams (thus generating many fragments) and running over an IPsec tunnel, reported more than 6Gbps throughput. After that patch, the same test gets only 9Mbps when receiving on a be2net nic (driver can make a big difference here, for example, ixgbe doesn't seem to be affected). By reusing the IPv4 defragmentation code, IPv6 lost fragment coalescing (IPv4 fragment coalescing was dropped by commit `14fe22e334` ("Revert "ipv4: use skb coalescing in defragmentation"")). Without fragment coalescing, be2net runs out of Rx ring entries and starts to drop frames (ethtool reports rx_drops_no_frags errors). Since the netperf traffic is only composed of UDP fragments, any lost packet prevents reassembly of the full datagram. Therefore, fragments which have no possibility to ever get reassembled pile up in the reassembly queue, until the memory accounting exeeds the threshold. At that point no fragment is accepted anymore, which effectively discards all netperf traffic. When reassembly timeout expires, some stale fragments are removed from the reassembly queue, so a few packets can be received, reassembled and delivered to the netperf receiver. But the nic still drops frames and soon the reassembly queue gets filled again with stale fragments. These long time frames where no datagram can be received explain why the performance drop is so significant. Re-introducing fragment coalescing is enough to get the initial performances again (6.6Gbps with be2net): driver doesn't drop frames anymore (no more rx_drops_no_frags errors) and the reassembly engine works at full speed. This patch is quite conservative and only coalesces skbs for local IPv4 and IPv6 delivery (in order to avoid changing skb geometry when forwarding). Coalescing could be extended in the future if need be, as more scenarios would probably benefit from it. [0]: Test configuration Sender: ip xfrm policy flush ip xfrm state flush ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1 ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir in tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1 ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir out tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow netserver -D -L fc00:2::1 Receiver: ip xfrm policy flush ip xfrm state flush ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1 ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir in tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1 ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir out tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow netperf -H fc00:2::1 -f k -P 0 -L fc00:1::1 -l 60 -t UDP_STREAM -I 99,5 -i 5,5 -T5,5 -6 Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 15:55:10 -07:00
David S. Miller	b3a598eb0d	This feature/cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - Replace usage of strlcpy with strscpy, by Sven Eckelmann - Add OGMv2 per-interface queue and aggregations, by Linus Luessing (2 patches) -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAl1MHekWHHN3QHNpbW9u d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoW8TEACox7vhtW4MS8QCzPySCU3f8V6m f+2PPlUbM4CqXcOPGw/jQng6PAtcb4gNPssm52GaxdeB9jFqkI/ELdSn5mCh+EcG QRrhf5DVruYyqBU2gNhovEe7SlJl8IJno6kFdAggaMngXnvlBzIr7n4FMIUKNFYn 6kFbA8pugBXXvhiRcuzs+l5iUxdKUxTsUNPyppyqnqb8lrb0/30/681dfq87PmcV zehEf8Ry23W7CVQv6YougVJvK0GUwysULsvm8Wc8FsOke7CeeQIPLEF2Pcrl/CFM mfynXVXngE41MPBC59eUcWBGlRYwkuwm4Q+YQ8OUjr5+X5YP06jR5Dh8u6KVAMuy QWGSwyrlXSsCE6BTxoijdJqsLzHDXCmYY0GQI2tEMCyDnL95CU3tuTk4vckusuf+ NlhHv7m+Bo0w9ztDUBifzNyURW9VgUCoOZfW9rdYRWjjN8Oe6wnMWCFttGnkg0qu zrCJn5mGvz3Vp434K0uGY1wOZincSdM6grBSgmZv1UNMaBdlfNroJsUqvz6IE5Fe iI5kqRXoUG+ftfwacgyFEK08HpcZHvJkNDiHHRlOPPZm75yE7mwreVUrGRaibywQ pzTEwM+H2MX9xF+osjiPVc197fxpnkX9fI2LhDOEiblaJbdiIxd+cgWRMFS1YPvd ANaFfAh58+gDWcRqhg== =Cbzf -----END PGP SIGNATURE----- Merge tag 'batadv-next-for-davem-20190808' of git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature/cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - Replace usage of strlcpy with strscpy, by Sven Eckelmann - Add OGMv2 per-interface queue and aggregations, by Linus Luessing (2 patches) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 11:28:40 -07:00
David S. Miller	f6649feb26	Here are some batman-adv bugfixes: - Fix netlink dumping of all mcast_flags buckets, by Sven Eckelmann - Fix deletion of RTR(4\|6) mcast list entries, by Sven Eckelmann -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAl1MHO0WHHN3QHNpbW9u d3VuZGVybGljaC5kZQAKCRChK+OYQpKeobHvD/9RQLU/2zrbq22QIhLwEt2DeGg3 Z//qHgK/LhDnLJRHA0F14NlA/4vPtzbfIX8m0Ai/mrPrk0naDKEeM6TSBw8IHQEz bsibnHkwB+cdgqGtVKFEHqL5seaD5Nnlu89xIvgTyeImq2Zd7heOphN8lLmJPdem 3TugLIkbEqO4P+KU8fdjuA+N4GcRzaArsendH+3z4LrlXJnh4YWvook4/gSjQwWe ydF+e2ktZvNEj8Xl5vsDpykBEFll+k4BThW8toIMx4FCLiw1Hqedp/agwIbs5Tjc ccdYa/wuKYUAwz5rsrqgx3FpH6z0w6h0b3AB++cLaahmas9qx/K7lT68PsJcPz0E qhZmbw0snRO/N+TZQPVd1B9i0H3efAaQzzUu/xfL2RRA91ztB5O2+3HZF3PBzw/J u1Ewt56vv4LV0QmeQLzczKMBS47DPoARk8mLOx0dpGl6vvkQYWBFbn+gk/+uPalk K15Yozd3h/nSEX3SWk50X3+AU/p8l7rAYcrOrpWsRmQGl06XBbRDr9mRijMoHnTE LM+AKiZLvOS1Q/UT/C0YwL2i665aG3NqzqMXq+0Mtciu6Lsefu8+JOZFxcK4GXGU XJ+Kkygf/3/5dhu3Nxm4GchQQSLVJRn/i0nCKJeWS/Qxdvcalk8lPQPrX2a2l6GY kW2KxW/KjwayzLpwSQ== =g2Gm -----END PGP SIGNATURE----- Merge tag 'batadv-net-for-davem-20190808' of git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here are some batman-adv bugfixes: - Fix netlink dumping of all mcast_flags buckets, by Sven Eckelmann - Fix deletion of RTR(4\|6) mcast list entries, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 11:25:39 -07:00
David S. Miller	13dfb3fa49	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Just minor overlapping changes in the conflicts here. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-06 18:44:57 -07:00
Yifeng Sun	aa733660db	openvswitch: Print error when ovs_execute_actions() fails Currently in function ovs_dp_process_packet(), return values of ovs_execute_actions() are silently discarded. This patch prints out an debug message when error happens so as to provide helpful hints for debugging. Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-06 14:38:12 -07:00
Vladimir Oltean	93fa8587b2	net: dsa: sja1105: Fix memory leak on meta state machine error path When RX timestamping is enabled and two link-local (non-meta) frames are received in a row, this constitutes an error. The tagger is always caching the last link-local frame, in an attempt to merge it with the meta follow-up frame when that arrives. To recover from the above error condition, the initial cached link-local frame is dropped and the second frame in a row is cached (in expectance of the second meta frame). However, when dropping the initial link-local frame, its backing memory was being leaked. Fixes: `f3097be21b` ("net: dsa: sja1105: Add a state machine for RX timestamping") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-06 14:37:02 -07:00
Vladimir Oltean	f163fed276	net: dsa: sja1105: Fix memory leak on meta state machine normal path After a meta frame is received, it is associated with the cached sp->data->stampable_skb from the DSA tagger private structure. Cached means its refcount is incremented with skb_get() in order for dsa_switch_rcv() to not free it when the tagger .rcv returns NULL. The mistake is that skb_unref() is not the correct function to use. It will correctly decrement the refcount (which will go back to zero) but the skb memory will not be freed. That is the job of kfree_skb(), which also calls skb_unref(). But it turns out that freeing the cached stampable_skb is in fact not necessary. It is still a perfectly valid skb, and now it is even annotated with the partial RX timestamp. So remove the skb_copy() altogether and simply pass the stampable_skb with a refcount of 1 (incremented by us, decremented by dsa_switch_rcv) up the stack. Fixes: `f3097be21b` ("net: dsa: sja1105: Add a state machine for RX timestamping") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-06 14:37:02 -07:00
John Hurley	48e584ac58	net: sched: add ingress mirred action to hardware IR TC mirred actions (redirect and mirred) can send to egress or ingress of a device. Currently only egress is used for hw offload rules. Modify the intermediate representation for hw offload to include mirred actions that go to ingress. This gives drivers access to such rules and can decide whether or not to offload them. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-06 14:24:21 -07:00
John Hurley	fb1b775a24	net: sched: add skbedit of ptype action to hardware IR TC rules can impliment skbedit actions. Currently actions that modify the skb mark are passed to offloading drivers via the hardware intermediate representation in the flow_offload API. Extend this to include skbedit actions that modify the packet type of the skb. Such actions may be used to set the ptype to HOST when redirecting a packet to ingress. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-06 14:24:21 -07:00
Dave Taht	77ddaff218	fq_codel: Kill useless per-flow dropped statistic It is almost impossible to get anything other than a 0 out of flow->dropped statistic with a tc class dump, as it resets to 0 on every round. It also conflates ecn marks with drops. It would have been useful had it kept a cumulative drop count, but it doesn't. This patch doesn't change the API, it just stops tracking a stat and state that is impossible to measure and nobody uses. Signed-off-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-06 14:18:20 -07:00
Dave Taht	ae697f3bf7	Increase fq_codel count in the bulk dropper In the field fq_codel is often used with a smaller memory or packet limit than the default, and when the bulk dropper is hit, the drop pattern bifircates into one that more slowly increases the codel drop rate and hits the bulk dropper more than it should. The scan through the 1024 queues happens more often than it needs to. This patch increases the codel count in the bulk dropper, but does not change the drop rate there, relying on the next codel round to deliver the next packet at the original drop rate (after that burst of loss), then escalate to a higher signaling rate. Signed-off-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-06 14:18:20 -07:00
Vivien Didelot	48e2331197	net: dsa: dump CPU port regs through master Merge the CPU port registers dump into the master interface registers dump through ethtool, by nesting the ethtool_drvinfo and ethtool_regs structures of the CPU port into the dump. drvinfo->regdump_len will contain the full data length, while regs->len will contain only the master interface registers dump length. This allows for example to dump the CPU port registers on a ZII Dev C board like this: # ethtool -d eth1 0x004: 0x00000000 0x008: 0x0a8000aa 0x010: 0x01000000 0x014: 0x00000000 0x024: 0xf0000102 0x040: 0x6d82c800 0x044: 0x00000020 0x064: 0x40000000 0x084: RCR (Receive Control Register) 0x47c00104 MAX_FL (Maximum frame length) 1984 FCE (Flow control enable) 0 BC_REJ (Broadcast frame reject) 0 PROM (Promiscuous mode) 0 DRT (Disable receive on transmit) 0 LOOP (Internal loopback) 0 0x0c4: TCR (Transmit Control Register) 0x00000004 RFC_PAUSE (Receive frame control pause) 0 TFC_PAUSE (Transmit frame control pause) 0 FDEN (Full duplex enable) 1 HBC (Heartbeat control) 0 GTS (Graceful transmit stop) 0 0x0e4: 0x76735d6d 0x0e8: 0x7e9e8808 0x0ec: 0x00010000 . . . 88E6352 Switch Port Registers ------------------------------ 00: Port Status 0x4d04 Pause Enabled 0 My Pause 1 802.3 PHY Detected 0 Link Status Up Duplex Full Speed 100 or 200 Mbps EEE Enabled 0 Transmitter Paused 0 Flow Control 0 Config Mode 0x4 01: Physical Control 0x003d RGMII Receive Timing Control Default RGMII Transmit Timing Control Default 200 BASE Mode 100 Flow Control's Forced value 0 Force Flow Control 0 Link's Forced value Up Force Link 1 Duplex's Forced value Full Force Duplex 1 Force Speed 100 or 200 Mbps . . . Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-06 14:08:32 -07:00
Roman Mashak	b35475c549	net sched: update vlan action for batched events operations Add get_fill_size() routine used to calculate the action size when building a batch of events. Fixes: `c7e2b9689` ("sched: introduce vlan action") Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-06 14:05:39 -07:00
Ido Schimmel	b19d955055	drop_monitor: Use pre_doit / post_doit hooks Each operation from user space should be protected by the global drop monitor mutex. Use the pre_doit / post_doit hooks to take / release the lock instead of doing it explicitly in each function. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-06 12:37:56 -07:00
Ido Schimmel	965100966e	drop_monitor: Add extack support Add various extack messages to make drop_monitor more user friendly. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-06 12:37:56 -07:00
Ido Schimmel	ff3818ca39	drop_monitor: Avoid multiple blank lines Remove multiple blank lines which are visually annoying and useless. This suppresses the "Please don't use multiple blank lines" checkpatch messages. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-06 12:37:56 -07:00
Ido Schimmel	01921d53f8	drop_monitor: Document scope of spinlock While 'per_cpu_dm_data' is a per-CPU variable, its 'skb' and 'send_timer' fields can be accessed concurrently by the CPU sending the netlink notification to user space from the workqueue and the CPU tracing kfree_skb(). This spinlock is meant to protect against that. Document its scope and suppress the checkpatch message "spinlock_t definition without comment". Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-06 12:37:56 -07:00
Ido Schimmel	dbf896b70d	drop_monitor: Rename and document scope of mutex The 'trace_state_mutex' does not only protect the global 'trace_state' variable, but also the global 'hw_stats_list'. Subsequent patches are going add more operations from user space to drop_monitor and these all need to be mutually exclusive. Rename 'trace_state_mutex' to the more fitting 'net_dm_mutex' name and document its scope. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-06 12:37:56 -07:00
Ido Schimmel	2230a7ef51	drop_monitor: Use correct error code The error code 'ENOTSUPP' is reserved for use with NFS. Use 'EOPNOTSUPP' instead. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-06 12:37:56 -07:00
Marek Vasut	267df70fe8	net: dsa: ksz: Drop NET_DSA_TAG_KSZ9477 This Kconfig option is unused, drop it. Signed-off-by: Marek Vasut <marex@denx.de> Cc: Andrew Lunn <andrew@lunn.ch> Cc: David S. Miller <davem@davemloft.net> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Tristram Ha <Tristram.Ha@microchip.com> Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-06 11:59:17 -07:00
Nikolay Aleksandrov	091adf9ba6	net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER Most of the bridge device's vlan init bugs come from the fact that its default pvid is created at the wrong time, way too early in ndo_init() before the device is even assigned an ifindex. It introduces a bug when the bridge's dev_addr is added as fdb during the initial default pvid creation the notification has ifindex/NDA_MASTER both equal to 0 (see example below) which really makes no sense for user-space[0] and is wrong. Usually user-space software would ignore such entries, but they are actually valid and will eventually have all necessary attributes. It makes much more sense to send a notification after the device has registered and has a proper ifindex allocated rather than before when there's a chance that the registration might still fail or to receive it with ifindex/NDA_MASTER == 0. Note that we can remove the fdb flush from br_vlan_flush() since that case can no longer happen. At NETDEV_REGISTER br->default_pvid is always == 1 as it's initialized by br_vlan_init() before that and at NETDEV_UNREGISTER it can be anything depending why it was called (if called due to NETDEV_REGISTER error it'll still be == 1, otherwise it could be any value changed during the device life time). For the demonstration below a small change to iproute2 for printing all fdb notifications is added, because it contained a workaround not to show entries with ifindex == 0. Command executed while monitoring: $ ip l add br0 type bridge Before (both ifindex and master == 0): $ bridge monitor fdb 36:7e:8a:b3:56:ba dev * vlan 1 master * permanent After (proper br0 ifindex): $ bridge monitor fdb e6:2a:ae:7a:b7:48 dev br0 vlan 1 master br0 permanent v4: move only the default pvid init/deinit to NETDEV_REGISTER/UNREGISTER v3: send the correct v2 patch with all changes (stub should return 0) v2: on error in br_vlan_init set br->vlgrp to NULL and return 0 in the br_vlan_bridge_event stub when bridge vlans are disabled [0] https://bugzilla.kernel.org/show_bug.cgi?id=204389 Reported-by: michael-dev <michael-dev@fami-braun.de> Fixes: `5be5a2df40` ("bridge: Add filtering support for default_pvid") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-05 13:32:53 -07:00
Ursula Braun	cd2063604e	net/smc: avoid fallback in case of non-blocking connect FASTOPEN is not possible with SMC. sendmsg() with msg_flag MSG_FASTOPEN triggers a fallback to TCP if the socket is in state SMC_INIT. But if a nonblocking connect is already started, fallback to TCP is no longer possible, even though the socket may still be in state SMC_INIT. And if a nonblocking connect is already started, a listen() call does not make sense. Reported-by: syzbot+bd8cc73d665590a1fcad@syzkaller.appspotmail.com Fixes: `50717a37db` ("net/smc: nonblocking connect rework") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-05 13:24:37 -07:00
Ursula Braun	f9cedf1a9b	net/smc: do not schedule tx_work in SMC_CLOSED state The setsockopts options TCP_NODELAY and TCP_CORK may schedule the tx worker. Make sure the socket is not yet moved into SMC_CLOSED state (for instance by a shutdown SHUT_RDWR call). Reported-by: syzbot+92209502e7aab127c75f@syzkaller.appspotmail.com Reported-by: syzbot+b972214bb803a343f4fe@syzkaller.appspotmail.com Fixes: `01d2f7e2cd` ("net/smc: sockopts TCP_NODELAY and TCP_CORK") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-05 13:23:05 -07:00
David Ahern	43a4b60d04	ipv6: have a single rcu unlock point in __ip6_rt_update_pmtu Simplify the unlock path in __ip6_rt_update_pmtu by using a single point where rcu_read_unlock is called. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-05 13:17:54 -07:00
David Ahern	cff6a327d7	ipv6: Fix unbalanced rcu locking in rt6_update_exception_stamp_rt The nexthop path in rt6_update_exception_stamp_rt needs to call rcu_read_unlock if it fails to find a fib6_nh match rather than just returning. Fixes: `e659ba31d8` ("ipv6: Handle all fib6_nh in a nexthop in exception handling") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-05 13:16:58 -07:00
Jakub Kicinski	5d92e631b8	net/tls: partially revert fix transition through disconnect with close Looks like we were slightly overzealous with the shutdown() cleanup. Even though the sock->sk_state can reach CLOSED again, socket->state will not got back to SS_UNCONNECTED once connections is ESTABLISHED. Meaning we will see EISCONN if we try to reconnect, and EINVAL if we try to listen. Only listen sockets can be shutdown() and reused, but since ESTABLISHED sockets can never be re-connected() or used for listen() we don't need to try to clean up the ULP state early. Fixes: `32857cf57f` ("net/tls: fix transition through disconnect with close") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-05 13:15:30 -07:00
Jesper Dangaard Brouer	065af35547	net: fix bpf_xdp_adjust_head regression for generic-XDP When generic-XDP was moved to a later processing step by commit `458bf2f224` ("net: core: support XDP generic on stacked devices.") a regression was introduced when using bpf_xdp_adjust_head. The issue is that after this commit the skb->network_header is now changed prior to calling generic XDP and not after. Thus, if the header is changed by XDP (via bpf_xdp_adjust_head), then skb->network_header also need to be updated again. Fix by calling skb_reset_network_header(). Fixes: `458bf2f224` ("net: core: support XDP generic on stacked devices.") Reported-by: Brandon Cazander <brandon.cazander@multapplied.net> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-05 11:17:40 -07:00
Dmytro Linkin	7be8ef2cdb	net: sched: use temporary variable for actions indexes Currently init call of all actions (except ipt) init their 'parm' structure as a direct pointer to nla data in skb. This leads to race condition when some of the filter actions were initialized successfully (and were assigned with idr action index that was written directly into nla data), but then were deleted and retried (due to following action module missing or classifier-initiated retry), in which case action init code tries to insert action to idr with index that was assigned on previous iteration. During retry the index can be reused by another action that was inserted concurrently, which causes unintended action sharing between filters. To fix described race condition, save action idr index to temporary stack-allocated variable instead on nla data. Fixes: `0190c1d452` ("net: sched: atomically check-allocate action") Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-05 10:59:14 -07:00
Florian Westphal	589b474a4b	netfilter: nf_flow_table: fix offload for flows that are subject to xfrm This makes the previously added 'encap test' pass. Because its possible that the xfrm dst entry becomes stale while such a flow is offloaded, we need to call dst_check() -- the notifier that handles this for non-tunneled traffic isn't sufficient, because SA or or policies might have changed. If dst becomes stale the flow offload entry will be tagged for teardown and packets will be passed to 'classic' forwarding path. Removing the entry right away is problematic, as this would introduce a race condition with the gc worker. In case flow is long-lived, it could eventually be offloaded again once the gc worker removes the entry from the flow table. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-05 11:29:50 +02:00
Linus Lüssing	9cb9a17813	batman-adv: BATMAN_V: aggregate OGMv2 packets Instead of transmitting individual OGMv2 packets from the aggregation queue merge those OGMv2 packets into a single one and transmit this aggregate instead. This reduces overhead as it saves an ethernet header and a transmission per aggregated OGMv2 packet. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2019-08-04 22:22:00 +02:00
Linus Lüssing	f89255a02f	batman-adv: BATMAN_V: introduce per hard-iface OGMv2 queues In preparation for the OGMv2 packet aggregation, hold OGMv2 packets for up to BATADV_MAX_AGGREGATION_MS milliseconds (100ms) on per hard-interface queues, before transmitting. This allows us to later squash multiple OGMs into a single frame and transmission for reduced overhead. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2019-08-04 22:22:00 +02:00
Junwei Hu	1b90af292e	ipvs: Improve robustness to the ipvs sysctl The ipvs module parse the user buffer and save it to sysctl, then check if the value is valid. invalid value occurs over a period of time. Here, I add a variable, struct ctl_table tmp, used to read the value from the user buffer, and save only when it is valid. I delete proc_do_sync_mode and use extra1/2 in table for the proc_dointvec_minmax call. Fixes: `f73181c828` ("ipvs: add support for sync threads") Signed-off-by: Junwei Hu <hujunwei4@huawei.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-03 18:39:16 +02:00
Matteo Croce	e84fb4b366	netfilter: conntrack: use shared sysctl constants Use shared sysctl variables for zero and one constants, as in commit `eec4844fae` ("proc/sysctl: add shared variables for range check") Fixes: `8f14c99c7e` ("netfilter: conntrack: limit sysctl setting for boolean options") Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-03 18:39:08 +02:00
Fernando Fernandez Mancera	8c0bb78738	netfilter: synproxy: rename mss synproxy_options field After introduce "mss_encode" field in the synproxy_options struct the field "mss" is a little confusing. It has been renamed to "mss_option". Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-03 18:39:08 +02:00
Dexuan Cui	685703b497	hv_sock: Fix hang when a connection is closed There is a race condition for an established connection that is being closed by the guest: the refcnt is 4 at the end of hvs_release() (Note: here the 'remove_sock' is false): 1 for the initial value; 1 for the sk being in the bound list; 1 for the sk being in the connected list; 1 for the delayed close_work. After hvs_release() finishes, __vsock_release() -> sock_put(sk) may decrease the refcnt to 3. Concurrently, hvs_close_connection() runs in another thread: calls vsock_remove_sock() to decrease the refcnt by 2; call sock_put() to decrease the refcnt to 0, and free the sk; next, the "release_sock(sk)" may hang due to use-after-free. In the above, after hvs_release() finishes, if hvs_close_connection() runs faster than "__vsock_release() -> sock_put(sk)", then there is not any issue, because at the beginning of hvs_close_connection(), the refcnt is still 4. The issue can be resolved if an extra reference is taken when the connection is established. Fixes: `a9eeb998c2` ("hv_sock: Add support for delayed close") Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Sunil Muthuswamy <sunilmut@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-02 17:26:27 -07:00
Jon Maloy	7c5b420559	tipc: reduce risk of wakeup queue starvation In commit `365ad353c2` ("tipc: reduce risk of user starvation during link congestion") we allowed senders to add exactly one list of extra buffers to the link backlog queues during link congestion (aka "oversubscription"). However, the criteria for when to stop adding wakeup messages to the input queue when the overload abates is inaccurate, and may cause starvation problems during very high load. Currently, we stop adding wakeup messages after 10 total failed attempts where we find that there is no space left in the backlog queue for a certain importance level. The counter for this is accumulated across all levels, which may lead the algorithm to leave the loop prematurely, although there may still be plenty of space available at some levels. The result is sometimes that messages near the wakeup queue tail are not added to the input queue as they should be. We now introduce a more exact algorithm, where we keep adding wakeup messages to a level as long as the backlog queue has free slots for the corresponding level, and stop at the moment there are no more such slots or when there are no more wakeup messages to dequeue. Fixes: `365ad35` ("tipc: reduce risk of user starvation during link congestion") Reported-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-01 18:19:28 -04:00
Taras Kondratiuk	4da5f0018e	tipc: compat: allow tipc commands without arguments Commit `2753ca5d90` ("tipc: fix uninit-value in tipc_nl_compat_doit") broke older tipc tools that use compat interface (e.g. tipc-config from tipcutils package): % tipc-config -p operation not supported The commit started to reject TIPC netlink compat messages that do not have attributes. It is too restrictive because some of such messages are valid (they don't need any arguments): % grep 'tx none' include/uapi/linux/tipc_config.h #define TIPC_CMD_NOOP 0x0000 /* tx none, rx none / #define TIPC_CMD_GET_MEDIA_NAMES 0x0002 / tx none, rx media_name(s) / #define TIPC_CMD_GET_BEARER_NAMES 0x0003 / tx none, rx bearer_name(s) / #define TIPC_CMD_SHOW_PORTS 0x0006 / tx none, rx ultra_string / #define TIPC_CMD_GET_REMOTE_MNG 0x4003 / tx none, rx unsigned / #define TIPC_CMD_GET_MAX_PORTS 0x4004 / tx none, rx unsigned / #define TIPC_CMD_GET_NETID 0x400B / tx none, rx unsigned / #define TIPC_CMD_NOT_NET_ADMIN 0xC001 / tx none, rx none */ This patch relaxes the original fix and rejects messages without arguments only if such arguments are expected by a command (reg_type is non zero). Fixes: `2753ca5d90` ("tipc: fix uninit-value in tipc_nl_compat_doit") Cc: stable@vger.kernel.org Signed-off-by: Taras Kondratiuk <takondra@cisco.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-01 18:14:00 -04:00
Nikolay Aleksandrov	3247b27204	net: bridge: mcast: add delete due to fast-leave mdb flag In user-space there's no way to distinguish why an mdb entry was deleted and that is a problem for daemons which would like to keep the mdb in sync with remote ends (e.g. mlag) but would also like to converge faster. In almost all cases we'd like to age-out the remote entry for performance and convergence reasons except when fast-leave is enabled. In that case we want explicit immediate remote delete, thus add mdb flag which is set only when the entry is being deleted due to fast-leave. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-31 19:13:40 -04:00
Nikolay Aleksandrov	5c725b6b65	net: bridge: mcast: don't delete permanent entries when fast leave is enabled When permanent entries were introduced by the commit below, they were exempt from timing out and thus igmp leave wouldn't affect them unless fast leave was enabled on the port which was added before permanent entries existed. It shouldn't matter if fast leave is enabled or not if the user added a permanent entry it shouldn't be deleted on igmp leave. Before: $ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave $ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent $ bridge mdb show dev br0 port eth4 grp 229.1.1.1 permanent < join and leave 229.1.1.1 on eth4 > $ bridge mdb show $ After: $ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave $ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent $ bridge mdb show dev br0 port eth4 grp 229.1.1.1 permanent < join and leave 229.1.1.1 on eth4 > $ bridge mdb show dev br0 port eth4 grp 229.1.1.1 permanent Fixes: `ccb1c31a7a` ("bridge: add flags to distinguish permanent mdb entires") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-31 19:03:01 -04:00
David S. Miller	ac5fe22636	We have a reasonably large number of changes: * lots more HE (802.11ax) support, particularly things relevant for the the AP side, but also mesh support * debugfs cleanups from Greg * some more work on extended key ID * start using genl parallel_ops, as preparation for weaning ourselves off RTNL and getting parallelism * various other changes all over -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl1BtsAACgkQB8qZga/f l8QTdQ//bD3NLitiIW4rBmC/w9str2TRpUbRN8Hf9xTnRh1smLGQop8jI8xpYdAE hXFlWn4glCkTkZrqtT8IDMcERb2YPHI94L7lMR1L6dXH7e1tiBZ7Qum5+rojLUCQ /XXZJnVa9AbdzBDtkhzCjkN8dCldMnkId0m6vkGdFre/b70FLDg2GlolqIchbDCz GxeKaw/uTLwu9nxEJhspDmiXDQtQd1ZsGNZRrovQ2nUuUfvX9sU54OmaoDHhqrUM iSwFZJAhrCV7nidOLs2X1rVs8hRymbrYnGu6NdUlfTQMqaOVgWhqniM1RMnJnnVi Ec/1OQg4KIj2rjJIXW/3QXAzAO6TDwV8xNk9al3m+yAkkSXR7tXP01plNUoMU4UW YxcaPD341H919GIqa71lieeEMyi7XIKKFfaZvGFLiDzidKKi0JkDuC+1w5UOJrbj PM/dzvKqvwOMEa5XahjrUQLsiGMgxBkDo36+F4rfaelzDKPuJBOokDS23vg/lYf5 2v74WbeJVc45cAhSwp5/0RTsG0xhmFImfrE66VNB1vcJUVLvHTuATfXsBbRahOz/ SglGBAGqigxJwIeYfgec3lPNfdV+0LwYnPAEjBIjCO8qlGvst4jLroHg7ayApr2E JfUbaWtpEb9cgAz7X6tEjv/BAnsRWPzpxb8Nm6JoZttTtA2VakE= =eUnh -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2019-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== We have a reasonably large number of changes: * lots more HE (802.11ax) support, particularly things relevant for the the AP side, but also mesh support * debugfs cleanups from Greg * some more work on extended key ID * start using genl parallel_ops, as preparation for weaning ourselves off RTNL and getting parallelism * various other changes all over ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-31 08:59:41 -07:00
David S. Miller	f86a677e57	Just a few fixes: * revert NETIF_F_LLTX usage as it caused problems * avoid warning on WMM parameters from AP that are too short * fix possible null-ptr dereference in hwsim * fix interface combinations with 4-addr and crypto control -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl1BfGcACgkQB8qZga/f l8TE/A/9G98DIi7UpuseyrW1y+3Bab5G+N4JhNDYMF0fC+f+6VZPsQRQi0aH2GVE fBDnFQogiqIY8WwgQ5RoyUjOux43INpNp3uPjOCzcdjj9+D+lBX/27yhNVBVYK5H Uxq08W2Mv+GWcswa5NMGUF9FlARbIMSu1BsDML031LzSckNvSLwMuANSx/1/d/gr rUa1waaGeC2yT/baefNaHDiqYBj/6Xm6U0Adl1PphGJZ587UjlylCwozNNIrUIwW PeysRXIUakePEqNs0Kl6U6lUJoH3g1KB+fLl5ea94L2aBHHRRaBAG1yROidg1/JG fmz8jostoasG+wuycNSdETVgY3TT37Hv0R7rRB0Wj5QvrdXqgyrAZSR97dMWR9wR fZLZCXtYnxOVZwKUlDusSjDFm9CKdhM+KqINHXBa6RRWpQjFYn4AvbQlJEZrbyuJ lBOvXAvmF1fsX5x9hmDLRM7Irtb9Awb7eHbT+T5c+BtWKvnR2GsujMy2f6TY+swN WPmER6MtQ80VPk9xpLVxU02vLs0EIdoPxE7I348ozDmW6sln0kRI1Pp7uOqDO1RC sX6uA8bSTp9u2tKsgevkHqiz3YiJOqmy475tcOg9F/CefsULZzVfBE0bsU4v0jnJ LTXJS2GtmPhG4AJFn+qiqvvOzJGhAJAEsvmPvUgH0AmB4m0+Cqg= =rE1e -----END PGP SIGNATURE----- Merge tag 'mac80211-for-davem-2019-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Just a few fixes: * revert NETIF_F_LLTX usage as it caused problems * avoid warning on WMM parameters from AP that are too short * fix possible null-ptr dereference in hwsim * fix interface combinations with 4-addr and crypto control ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-31 08:51:34 -07:00
David S. Miller	fa9586aff9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== netfilter fixes for net The following patchset contains Netfilter fixes for your net tree: 1) memleak in ebtables from the error path for the 32/64 compat layer, from Florian Westphal. 2) Fix inverted meta ifname/ifidx matching when no interface is set on either from the input/output path, from Phil Sutter. 3) Remove goto label in nft_meta_bridge, also from Phil. 4) Missing include guard in xt_connlabel, from Masahiro Yamada. 5) Two patch to fix ipset destination MAC matching coming from Stephano Brivio, via Jozsef Kadlecsik. 6) Fix set rename and listing concurrency problem, from Shijie Luo. Patch also coming via Jozsef Kadlecsik. 7) ebtables 32/64 compat missing base chain policy in rule count, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-31 08:49:09 -07:00
Shay Bar	f39b07fdfb	mac80211: HE STA disassoc due to QOS NULL not sent In case of HE AP-STA link, ieee80211_send_nullfunc() will not send the QOS NULL packet to check if AP is still associated. In this case, probe_send_count will be non-zero and ieee80211_sta_work() will later disassociate the AP, even though no packet was ever sent. Fix this by decrementing probe_send_count and not calling ieee80211_send_nullfunc() in case of HE link, so that we still wait for some time for the AP beacon to reappear and don't disconnect right away. Signed-off-by: Shay Bar <shay.bar@celeno.com> Link: https://lore.kernel.org/r/20190703131848.22879-1-shay.bar@celeno.com [clarify commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-31 13:26:41 +02:00
John Crispin	1ced169cc1	mac80211: allow setting spatial reuse parameters from bss_conf Store the OBSS PD parameters inside bss_conf when bringing up an AP and/or when a station connects to an AP. This allows the driver to configure the HW accordingly. Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20190730163701.18836-3-john@phrozen.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-31 11:00:52 +02:00
Johannes Berg	6d4dd4ef1a	nl80211: add strict start type Add a strict start type so all new attributes starting from NL80211_ATTR_HE_OBSS_PD are validated strictly. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-31 11:00:52 +02:00
John Crispin	796e90f42b	cfg80211: add support for parsing OBBS_PD attributes Add the data structure, policy and parsing code allowing userland to send the OBSS PD information into the kernel. Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20190730163701.18836-2-john@phrozen.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-31 11:00:52 +02:00
Karthikeyan Periyasamy	52dba8d7d5	mac80211: reject zero MAC address in add station This came up in fuzz testing, and really we don't consider all-zeroes to be a valid MAC address in most places, so also reject it here to avoid confusion later on. Signed-off-by: Karthikeyan Periyasamy <periyasa@codeaurora.org> Link: https://lore.kernel.org/r/1563959770-21570-1-git-send-email-periyasa@codeaurora.org [rewrite commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-31 11:00:52 +02:00
Johannes Berg	50508d941c	cfg80211: use parallel_ops for genl Over time, we really need to get rid of all of our global locking. One of the things needed is to use parallel_ops. This isn't really the most important (RTNL is much more important) but OTOH we just keep adding uses of genl_family_attrbuf() now. Use .parallel_ops to disallow this. Reviewed-By: Denis Kenzior <denkenz@gmail.com> Link: https://lore.kernel.org/r/20190729143109.18683-1-johannes@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-31 11:00:46 +02:00
Colin Ian King	f12cac539f	mac80211: add missing null return check from call to ieee80211_get_sband The return from ieee80211_get_sband can potentially be a null pointer, so it seems prudent to add a null check to avoid a null pointer dereference on sband. Addresses-Coverity: ("Dereference null return") Fixes: `2ab4587675` ("mac80211: add support for the ADDBA extension element") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20190730143205.14261-1-colin.king@canonical.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-31 10:51:17 +02:00
Petar Penkov	70d6624431	bpf: add bpf_tcp_gen_syncookie helper This helper function allows BPF programs to try to generate SYN cookies, given a reference to a listener socket. The function works from XDP and with an skb context since bpf_skc_lookup_tcp can lookup a socket in both cases. Signed-off-by: Petar Penkov <ppenkov@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-07-30 21:03:05 -07:00
Petar Penkov	9349d600fb	tcp: add skb-less helpers to retrieve SYN cookie This patch allows generation of a SYN cookie before an SKB has been allocated, as is the case at XDP. Signed-off-by: Petar Penkov <ppenkov@google.com> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-07-30 21:03:05 -07:00
Petar Penkov	965112785e	tcp: tcp_syn_flood_action read port from socket This allows us to call this function before an SKB has been allocated. Signed-off-by: Petar Penkov <ppenkov@google.com> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-07-30 21:03:05 -07:00
Tristram Ha	016e43a26b	net: dsa: ksz: Add KSZ8795 tag code Add DSA tag code for Microchip KSZ8795 switch. The switch is simpler and the tag is only 1 byte, instead of 2 as is the case with KSZ9477. Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com> Signed-off-by: Marek Vasut <marex@denx.de> Cc: Andrew Lunn <andrew@lunn.ch> Cc: David S. Miller <davem@davemloft.net> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Tristram Ha <Tristram.Ha@microchip.com> Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-30 15:12:50 -07:00
Stefano Garzarella	0038ff357f	vsock/virtio: change the maximum packet size allowed Since now we are able to split packets, we can avoid limiting their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE. Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-30 15:00:00 -07:00
Stefano Garzarella	6dbd3e66e7	vhost/vsock: split packets to send using multiple buffers If the packets to sent to the guest are bigger than the buffer available, we can split them, using multiple buffers and fixing the length in the packet header. This is safe since virtio-vsock supports only stream sockets. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-30 15:00:00 -07:00
Stefano Garzarella	9632e9f61b	vsock/virtio: fix locking in virtio_transport_inc_tx_pkt() fwd_cnt and last_fwd_cnt are protected by rx_lock, so we should use the same spinlock also if we are in the TX path. Move also buf_alloc under the same lock. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-30 15:00:00 -07:00
Stefano Garzarella	b89d882dc9	vsock/virtio: reduce credit update messages In order to reduce the number of credit update messages, we send them only when the space available seen by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-30 15:00:00 -07:00
Stefano Garzarella	473c7391ce	vsock/virtio: limit the memory used per-socket Since virtio-vsock was introduced, the buffers filled by the host and pushed to the guest using the vring, are directly queued in a per-socket list. These buffers are preallocated by the guest with a fixed size (4 KB). The maximum amount of memory used by each socket should be controlled by the credit mechanism. The default credit available per-socket is 256 KB, but if we use only 1 byte per packet, the guest can queue up to 262144 of 4 KB buffers, using up to 1 GB of memory per-socket. In addition, the guest will continue to fill the vring with new 4 KB free buffers to avoid starvation of other sockets. This patch mitigates this issue copying the payload of small packets (< 128 bytes) into the buffer of last packet queued, in order to avoid wasting memory. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-30 15:00:00 -07:00
Arnd Bergmann	055d88242a	compat_ioctl: pppoe: fix PPPOEIOCSFWD handling Support for handling the PPPOEIOCSFWD ioctl in compat mode was added in linux-2.5.69 along with hundreds of other commands, but was always broken sincen only the structure is compatible, but the command number is not, due to the size being sizeof(size_t), or at first sizeof(sizeof((struct sockaddr_pppox)), which is different on 64-bit architectures. Guillaume Nault adds: And the implementation was broken until 2016 (see `29e73269aa` ("pppoe: fix reference counting in PPPoE proxy")), and nobody ever noticed. I should probably have removed this ioctl entirely instead of fixing it. Clearly, it has never been used. Fix it by adding a compat_ioctl handler for all pppoe variants that translates the command number and then calls the regular ioctl function. All other ioctl commands handled by pppoe are compatible between 32-bit and 64-bit, and require compat_ptr() conversion. This should apply to all stable kernels. Acked-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-30 14:42:13 -07:00
Jon Maloy	2948a1fcd7	tipc: fix unitilized skb list crash Our test suite somtimes provokes the following crash: Description of problem: [ 1092.597234] BUG: unable to handle kernel NULL pointer dereference at 00000000000000e8 [ 1092.605072] PGD 0 P4D 0 [ 1092.607620] Oops: 0000 [#1] SMP PTI [ 1092.611118] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Not tainted 4.18.0-122.el8.x86_64 #1 [ 1092.619724] Hardware name: Dell Inc. PowerEdge R740/08D89F, BIOS 1.3.7 02/08/2018 [ 1092.627215] RIP: 0010:tipc_mcast_filter_msg+0x93/0x2d0 [tipc] [ 1092.632955] Code: 0f 84 aa 01 00 00 89 cf 4d 01 ca 4c 8b 26 c1 ef 19 83 e7 0f 83 ff 0c 4d 0f 45 d1 41 8b 6a 10 0f cd 4c 39 e6 0f 84 81 01 00 00 <4d> 8b 9c 24 e8 00 00 00 45 8b 13 41 0f ca 44 89 d7 c1 ef 13 83 e7 [ 1092.651703] RSP: 0018:ffff929e5fa83a18 EFLAGS: 00010282 [ 1092.656927] RAX: ffff929e3fb38100 RBX: 00000000069f29ee RCX: 00000000416c0045 [ 1092.664058] RDX: ffff929e5fa83a88 RSI: ffff929e31a28420 RDI: 0000000000000000 [ 1092.671209] RBP: 0000000029b11821 R08: 0000000000000000 R09: ffff929e39b4407a [ 1092.678343] R10: ffff929e39b4407a R11: 0000000000000007 R12: 0000000000000000 [ 1092.685475] R13: 0000000000000001 R14: ffff929e3fb38100 R15: ffff929e39b4407a [ 1092.692614] FS: 0000000000000000(0000) GS:ffff929e5fa80000(0000) knlGS:0000000000000000 [ 1092.700702] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1092.706447] CR2: 00000000000000e8 CR3: 000000031300a004 CR4: 00000000007606e0 [ 1092.713579] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1092.720712] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1092.727843] PKRU: 55555554 [ 1092.730556] Call Trace: [ 1092.733010] <IRQ> [ 1092.735034] tipc_sk_filter_rcv+0x7ca/0xb80 [tipc] [ 1092.739828] ? __kmalloc_node_track_caller+0x1cb/0x290 [ 1092.744974] ? dev_hard_start_xmit+0xa5/0x210 [ 1092.749332] tipc_sk_rcv+0x389/0x640 [tipc] [ 1092.753519] tipc_sk_mcast_rcv+0x23c/0x3a0 [tipc] [ 1092.758224] tipc_rcv+0x57a/0xf20 [tipc] [ 1092.762154] ? ktime_get_real_ts64+0x40/0xe0 [ 1092.766432] ? tpacket_rcv+0x50/0x9f0 [ 1092.770098] tipc_l2_rcv_msg+0x4a/0x70 [tipc] [ 1092.774452] __netif_receive_skb_core+0xb62/0xbd0 [ 1092.779164] ? enqueue_entity+0xf6/0x630 [ 1092.783084] ? kmem_cache_alloc+0x158/0x1c0 [ 1092.787272] ? __build_skb+0x25/0xd0 [ 1092.790849] netif_receive_skb_internal+0x42/0xf0 [ 1092.795557] napi_gro_receive+0xba/0xe0 [ 1092.799417] mlx5e_handle_rx_cqe+0x83/0xd0 [mlx5_core] [ 1092.804564] mlx5e_poll_rx_cq+0xd5/0x920 [mlx5_core] [ 1092.809536] mlx5e_napi_poll+0xb2/0xce0 [mlx5_core] [ 1092.814415] ? __wake_up_common_lock+0x89/0xc0 [ 1092.818861] net_rx_action+0x149/0x3b0 [ 1092.822616] __do_softirq+0xe3/0x30a [ 1092.826193] irq_exit+0x100/0x110 [ 1092.829512] do_IRQ+0x85/0xd0 [ 1092.832483] common_interrupt+0xf/0xf [ 1092.836147] </IRQ> [ 1092.838255] RIP: 0010:cpuidle_enter_state+0xb7/0x2a0 [ 1092.843221] Code: e8 3e 79 a5 ff 80 7c 24 03 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d7 01 00 00 31 ff e8 a0 6b ab ff fb 66 0f 1f 44 00 00 <48> b8 ff ff ff ff f3 01 00 00 4c 29 f3 ba ff ff ff 7f 48 39 c3 7f [ 1092.861967] RSP: 0018:ffffaa5ec6533e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd [ 1092.869530] RAX: ffff929e5faa3100 RBX: 000000fe63dd2092 RCX: 000000000000001f [ 1092.876665] RDX: 000000fe63dd2092 RSI: 000000003a518aaa RDI: 0000000000000000 [ 1092.883795] RBP: 0000000000000003 R08: 0000000000000004 R09: 0000000000022940 [ 1092.890929] R10: 0000040cb0666b56 R11: ffff929e5faa20a8 R12: ffff929e5faade78 [ 1092.898060] R13: ffffffffb59258f8 R14: 000000fe60f3228d R15: 0000000000000000 [ 1092.905196] ? cpuidle_enter_state+0x92/0x2a0 [ 1092.909555] do_idle+0x236/0x280 [ 1092.912785] cpu_startup_entry+0x6f/0x80 [ 1092.916715] start_secondary+0x1a7/0x200 [ 1092.920642] secondary_startup_64+0xb7/0xc0 [...] The reason is that the skb list tipc_socket::mc_method.deferredq only is initialized for connectionless sockets, while nothing stops arriving multicast messages from being filtered by connection oriented sockets, with subsequent access to the said list. We fix this by initializing the list unconditionally at socket creation. This eliminates the crash, while the message still is dropped further down in tipc_sk_filter_rcv() as it should be. Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-30 14:39:36 -07:00
Jonathan Lemon	b54c9d5bd6	net: Use skb_frag_off accessors Use accessor functions for skb fragment's page_offset instead of direct references, in preparation for bvec conversion. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-30 14:21:32 -07:00
Xin Long	a64e59c72c	sctp: factor out sctp_connect_add_peer In this function factored out from sctp_sendmsg_new_asoc() and __sctp_connect(), it adds a peer with the other addr into the asoc after this asoc is created with the 1st addr. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-30 14:18:14 -07:00
Xin Long	f26f995122	sctp: factor out sctp_connect_new_asoc In this function factored out from sctp_sendmsg_new_asoc() and __sctp_connect(), it creates the asoc and adds a peer with the 1st addr. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-30 14:18:14 -07:00
Xin Long	dd8378b3af	sctp: clean up __sctp_connect __sctp_connect is doing quit similar things as sctp_sendmsg_new_asoc. To factor out common functions, this patch is to clean up their code to make them look more similar: 1. create the asoc and add a peer with the 1st addr. 2. add peers with the other addrs into this asoc one by one. while at it, also remove the unused 'addrcnt'. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-30 14:18:14 -07:00
Xin Long	f40f1177c3	sctp: check addr_size with sa_family_t size in __sctp_setsockopt_connectx Now __sctp_connect() is called by __sctp_setsockopt_connectx() and sctp_inet_connect(), the latter has done addr_size check with size of sa_family_t. In the next patch to clean up __sctp_connect(), we will remove addr_size check with size of sa_family_t from __sctp_connect() for the 1st address. So before doing that, __sctp_setsockopt_connectx() should do this check first, as sctp_inet_connect() does. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-30 14:18:14 -07:00
Xin Long	4c31bc6b1e	sctp: only copy the available addr data in sctp_transport_init 'addr' passed to sctp_transport_init is not always a whole size of union sctp_addr, like the path: sctp_sendmsg() -> sctp_sendmsg_new_asoc() -> sctp_assoc_add_peer() -> sctp_transport_new() -> sctp_transport_init() In the next patches, we will also pass the address length of data only to sctp_assoc_add_peer(). So sctp_transport_init() should copy the only available data from addr to peer->ipaddr, instead of 'peer->ipaddr = *addr' which may cause slab-out-of-bounds. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-30 14:18:14 -07:00
David Howells	1db88c5343	rxrpc: Fix -Wframe-larger-than= warnings from on-stack crypto rxkad sometimes triggers a warning about oversized stack frames when building with clang for a 32-bit architecture: net/rxrpc/rxkad.c:243:12: error: stack frame size of 1088 bytes in function 'rxkad_secure_packet' [-Werror,-Wframe-larger-than=] net/rxrpc/rxkad.c:501:12: error: stack frame size of 1088 bytes in function 'rxkad_verify_packet' [-Werror,-Wframe-larger-than=] The problem is the combination of SYNC_SKCIPHER_REQUEST_ON_STACK() in rxkad_verify_packet()/rxkad_secure_packet() with the relatively large scatterlist in rxkad_verify_packet_1()/rxkad_secure_packet_encrypt(). The warning does not show up when using gcc, which does not inline the functions as aggressively, but the problem is still the same. Allocate the cipher buffers from the slab instead, caching the allocated packet crypto request memory used for DATA packet crypto in the rxrpc_call struct. Fixes: `17926a7932` ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-30 10:32:35 -07:00
David Howells	c69565ee66	rxrpc: Fix the lack of notification when sendmsg() fails on a DATA packet Fix the fact that a notification isn't sent to the recvmsg side to indicate a call failed when sendmsg() fails to transmit a DATA packet with the error ENETUNREACH, EHOSTUNREACH or ECONNREFUSED. Without this notification, the afs client just sits there waiting for the call to complete in some manner (which it's not now going to do), which also pins the rxrpc call in place. This can be seen if the client has a scope-level IPv6 address, but not a global-level IPv6 address, and we try and transmit an operation to a server's IPv6 address. Looking in /proc/net/rxrpc/calls shows completed calls just sat there with an abort code of RX_USER_ABORT and an error code of -ENETUNREACH. Fixes: `c54e43d752` ("rxrpc: Fix missing start of call timeout") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com>	2019-07-30 15:27:59 +01:00
David Howells	60034d3d14	rxrpc: Fix potential deadlock There is a potential deadlock in rxrpc_peer_keepalive_dispatch() whereby rxrpc_put_peer() is called with the peer_hash_lock held, but if it reduces the peer's refcount to 0, rxrpc_put_peer() calls __rxrpc_put_peer() - which the tries to take the already held lock. Fix this by providing a version of rxrpc_put_peer() that can be called in situations where the lock is already held. The bug may produce the following lockdep report: ============================================ WARNING: possible recursive locking detected 5.2.0-next-20190718 #41 Not tainted -------------------------------------------- kworker/0:3/21678 is trying to acquire lock: 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: spin_lock_bh /./include/linux/spinlock.h:343 [inline] 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: __rxrpc_put_peer /net/rxrpc/peer_object.c:415 [inline] 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: rxrpc_put_peer+0x2d3/0x6a0 /net/rxrpc/peer_object.c:435 but task is already holding lock: 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: spin_lock_bh /./include/linux/spinlock.h:343 [inline] 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: rxrpc_peer_keepalive_dispatch /net/rxrpc/peer_event.c:378 [inline] 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: rxrpc_peer_keepalive_worker+0x6b3/0xd02 /net/rxrpc/peer_event.c:430 Fixes: `330bdcfadc` ("rxrpc: Fix the keepalive generator [ver #2]") Reported-by: syzbot+72af434e4b3417318f84@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com>	2019-07-30 14:42:50 +01:00
Johannes Berg	eef347f846	Revert "mac80211: set NETIF_F_LLTX when using intermediate tx queues" Revert this for now, it has been reported multiple times that it completely breaks connectivity on various devices. Cc: stable@vger.kernel.org Fixes: `8dbb000ee7` ("mac80211: set NETIF_F_LLTX when using intermediate tx queues") Reported-by: Jean Delvare <jdelvare@suse.de> Reported-by: Peter Lebbing <peter@digitalbrains.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-30 14:52:50 +02:00
Pablo Neira Ayuso	7cdc441228	Merge branch 'master' of git://blackhole.kfki.hu/nf Jozsef Kadlecsik says: ==================== ipset patches for the nf tree - When the support of destination MAC addresses for hash:mac sets was introduced, it was forgotten to add the same functionality to hash:ip,mac types of sets. The patch from Stefano Brivio adds the missing part. - When the support of destination MAC addresses for hash:mac sets was introduced, a copy&paste error was made in the code of the hash:ip,mac and bitmap:ip,mac types: the MAC address in these set types is in the second position and not in the first one. Stefano Brivio's patch fixes the issue. - There was still a not properly handled concurrency handling issue between renaming and listing sets at the same time, reported by Shijie Luo. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-07-30 13:39:20 +02:00
Florian Westphal	3b48300d5c	netfilter: ebtables: also count base chain policies ebtables doesn't include the base chain policies in the rule count, so we need to add them manually when we call into the x_tables core to allocate space for the comapt offset table. This lead syzbot to trigger: WARNING: CPU: 1 PID: 9012 at net/netfilter/x_tables.c:649 xt_compat_add_offset.cold+0x11/0x36 net/netfilter/x_tables.c:649 Reported-by: syzbot+276ddebab3382bbf72db@syzkaller.appspotmail.com Fixes: `2035f3ff8e` ("netfilter: ebtables: compat: un-break 32bit setsockopt when no rules are present") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-07-30 13:37:44 +02:00
Oliver Hartkopp	473d924d7d	can: fix ioctl function removal Commit `60649d4e0a` ("can: remove obsolete empty ioctl() handler") replaced the almost empty can_ioctl() function with sock_no_ioctl() which always returns -EOPNOTSUPP. Even though we don't have any ioctl() functions on socket/network layer we need to return -ENOIOCTLCMD to be able to forward ioctl commands like SIOCGIFINDEX to the network driver layer. This patch fixes the wrong return codes in the CAN network layer protocols. Reported-by: kernel test robot <rong.a.chen@intel.com> Fixes: `60649d4e0a` ("can: remove obsolete empty ioctl() handler") Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-29 14:12:35 -07:00
Enrico Weigelt	d4e575ba9f	net: sctp: drop unneeded likely() call around IS_ERR() IS_ERR() already calls unlikely(), so this extra unlikely() call around IS_ERR() is not needed. Signed-off-by: Enrico Weigelt <info@metux.net> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-29 13:57:58 -07:00
Toke Høiland-Jørgensen	6f9d451ab1	xdp: Add devmap_hash map type for looking up devices by hashed index A common pattern when using xdp_redirect_map() is to create a device map where the lookup key is simply ifindex. Because device maps are arrays, this leaves holes in the map, and the map has to be sized to fit the largest ifindex, regardless of how many devices actually are actually needed in the map. This patch adds a second type of device map where the key is looked up using a hashmap, instead of being used as an array index. This allows maps to be densely packed, so they can be smaller. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-07-29 13:50:48 -07:00
Jozsef Kadlecsik	6c1f7e2c1b	netfilter: ipset: Fix rename concurrency with listing Shijie Luo reported that when stress-testing ipset with multiple concurrent create, rename, flush, list, destroy commands, it can result ipset <version>: Broken LIST kernel message: missing DATA part! error messages and broken list results. The problem was the rename operation was not properly handled with respect of listing. The patch fixes the issue. Reported-by: Shijie Luo <luoshijie1@huawei.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>	2019-07-29 21:18:07 +02:00
Stefano Brivio	1b4a75108d	netfilter: ipset: Copy the right MAC address in bitmap:ip,mac and hash:ip,mac sets In commit `8cc4ccf583` ("ipset: Allow matching on destination MAC address for mac and ipmac sets"), ipset.git commit 1543514c46a7, I added to the KADT functions for sets matching on MAC addreses the copy of source or destination MAC address depending on the configured match. This was done correctly for hash:mac, but for hash:ip,mac and bitmap:ip,mac, copying and pasting the same code block presents an obvious problem: in these two set types, the MAC address is the second dimension, not the first one, and we are actually selecting the MAC address depending on whether the first dimension (IP address) specifies source or destination. Fix this by checking for the IPSET_DIM_TWO_SRC flag in option flags. This way, mixing source and destination matches for the two dimensions of ip,mac set types works as expected. With this setup: ip netns add A ip link add veth1 type veth peer name veth2 netns A ip addr add 192.0.2.1/24 dev veth1 ip -net A addr add 192.0.2.2/24 dev veth2 ip link set veth1 up ip -net A link set veth2 up dst=$(ip netns exec A cat /sys/class/net/veth2/address) ip netns exec A ipset create test_bitmap bitmap:ip,mac range 192.0.0.0/16 ip netns exec A ipset add test_bitmap 192.0.2.1,${dst} ip netns exec A iptables -A INPUT -m set ! --match-set test_bitmap src,dst -j DROP ip netns exec A ipset create test_hash hash:ip,mac ip netns exec A ipset add test_hash 192.0.2.1,${dst} ip netns exec A iptables -A INPUT -m set ! --match-set test_hash src,dst -j DROP ipset correctly matches a test packet: # ping -c1 192.0.2.2 >/dev/null # echo $? 0 Reported-by: Chen Yi <yiche@redhat.com> Fixes: `8cc4ccf583` ("ipset: Allow matching on destination MAC address for mac and ipmac sets") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>	2019-07-29 21:17:30 +02:00
Stefano Brivio	b89d15480d	netfilter: ipset: Actually allow destination MAC address for hash:ip,mac sets too In commit `8cc4ccf583` ("ipset: Allow matching on destination MAC address for mac and ipmac sets"), ipset.git commit 1543514c46a7, I removed the KADT check that prevents matching on destination MAC addresses for hash:mac sets, but forgot to remove the same check for hash:ip,mac set. Drop this check: functionality is now commented in man pages and there's no reason to restrict to source MAC address matching anymore. Reported-by: Chen Yi <yiche@redhat.com> Fixes: `8cc4ccf583` ("ipset: Allow matching on destination MAC address for mac and ipmac sets") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>	2019-07-29 21:16:57 +02:00
Jiri Pirko	55b40dbf0e	net: fix ifindex collision during namespace removal Commit `aca51397d0` ("netns: Fix arbitrary net_device-s corruptions on net_ns stop.") introduced a possibility to hit a BUG in case device is returning back to init_net and two following conditions are met: 1) dev->ifindex value is used in a name of another "dev%d" device in init_net. 2) dev->name is used by another device in init_net. Under real life circumstances this is hard to get. Therefore this has been present happily for over 10 years. To reproduce: $ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff 3: enp0s2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff $ ip netns add ns1 $ ip -n ns1 link add dummy1ns1 type dummy $ ip -n ns1 link add dummy2ns1 type dummy $ ip link set enp0s2 netns ns1 $ ip -n ns1 link set enp0s2 name dummy0 [ 100.858894] virtio_net virtio0 dummy0: renamed from enp0s2 $ ip link add dev4 type dummy $ ip -n ns1 a 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: dummy1ns1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 16:63:4c:38:3e:ff brd ff:ff:ff:ff:ff:ff 3: dummy2ns1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether aa:9e:86:dd:6b:5d brd ff:ff:ff:ff:ff:ff 4: dummy0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff $ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff 4: dev4: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 5a:e1:4a:b6:ec:f8 brd ff:ff:ff:ff:ff:ff $ ip netns del ns1 [ 158.717795] default_device_exit: failed to move dummy0 to init_net: -17 [ 158.719316] ------------[ cut here ]------------ [ 158.720591] kernel BUG at net/core/dev.c:9824! [ 158.722260] invalid opcode: 0000 [#1] SMP KASAN PTI [ 158.723728] CPU: 0 PID: 56 Comm: kworker/u2:1 Not tainted 5.3.0-rc1+ #18 [ 158.725422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 [ 158.727508] Workqueue: netns cleanup_net [ 158.728915] RIP: 0010:default_device_exit.cold+0x1d/0x1f [ 158.730683] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e [ 158.736854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282 [ 158.738752] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000 [ 158.741369] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64 [ 158.743418] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c [ 158.745626] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000 [ 158.748405] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72 [ 158.750638] FS: 0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000 [ 158.752944] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 158.755245] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0 [ 158.757654] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 158.760012] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 158.762758] Call Trace: [ 158.763882] ? dev_change_net_namespace+0xbb0/0xbb0 [ 158.766148] ? devlink_nl_cmd_set_doit+0x520/0x520 [ 158.768034] ? dev_change_net_namespace+0xbb0/0xbb0 [ 158.769870] ops_exit_list.isra.0+0xa8/0x150 [ 158.771544] cleanup_net+0x446/0x8f0 [ 158.772945] ? unregister_pernet_operations+0x4a0/0x4a0 [ 158.775294] process_one_work+0xa1a/0x1740 [ 158.776896] ? pwq_dec_nr_in_flight+0x310/0x310 [ 158.779143] ? do_raw_spin_lock+0x11b/0x280 [ 158.780848] worker_thread+0x9e/0x1060 [ 158.782500] ? process_one_work+0x1740/0x1740 [ 158.784454] kthread+0x31b/0x420 [ 158.786082] ? __kthread_create_on_node+0x3f0/0x3f0 [ 158.788286] ret_from_fork+0x3a/0x50 [ 158.789871] ---[ end trace defd6c657c71f936 ]--- [ 158.792273] RIP: 0010:default_device_exit.cold+0x1d/0x1f [ 158.795478] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e [ 158.804854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282 [ 158.807865] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000 [ 158.811794] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64 [ 158.816652] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c [ 158.820930] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000 [ 158.825113] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72 [ 158.829899] FS: 0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000 [ 158.834923] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 158.838164] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0 [ 158.841917] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 158.845149] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fix this by checking if a device with the same name exists in init_net and fallback to original code - dev%d to allocate name - in case it does. This was found using syzkaller. Fixes: `aca51397d0` ("netns: Fix arbitrary net_device-s corruptions on net_ns stop.") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-29 11:07:04 -07:00
Gustavo A. R. Silva	05bba1edaf	net/af_iucv: mark expected switch fall-throughs Mark switch cases where we are expecting to fall through. This patch fixes the following warnings: net/iucv/af_iucv.c: warning: this statement may fall through [-Wimplicit-fallthrough=]: => 537:3, 519:6, 2246:6, 510:6 Notice that, in this particular case, the code comment is modified in accordance with what GCC is expecting to find. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-29 10:26:14 -07:00
Nikolay Aleksandrov	d7bae09fa0	net: bridge: delete local fdb on device init failure On initialization failure we have to delete the local fdb which was inserted due to the default pvid creation. This problem has been present since the inception of default_pvid. Note that currently there are 2 cases: 1) in br_dev_init() when br_multicast_init() fails 2) if register_netdevice() fails after calling ndo_init() This patch takes care of both since br_vlan_flush() is called on both occasions. Also the new fdb delete would be a no-op on normal bridge device destruction since the local fdb would've been already flushed by br_dev_delete(). This is not an issue for ports since nbp_vlan_init() is called last when adding a port thus nothing can fail after it. Reported-by: syzbot+88533dc8b582309bf3ee@syzkaller.appspotmail.com Fixes: `5be5a2df40` ("bridge: Add filtering support for default_pvid") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-29 09:50:05 -07:00
Jia-Ju Bai	051c7b39be	net: sched: Fix a possible null-pointer dereference in dequeue_func() In dequeue_func(), there is an if statement on line 74 to check whether skb is NULL: if (skb) When skb is NULL, it is used on line 77: prefetch(&skb->end); Thus, a possible null-pointer dereference may occur. To fix this bug, skb->end is used when skb is not NULL. This bug is found by a static analysis tool STCheck written by us. Fixes: `76e3cc126b` ("codel: Controlled Delay AQM") Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-29 09:46:58 -07:00
Andrey Konovalov	18917d5147	NFC: fix attrs checks in netlink interface nfc_genl_deactivate_target() relies on the NFC_ATTR_TARGET_INDEX attribute being present, but doesn't check whether it is actually provided by the user. Same goes for nfc_genl_fw_download() and NFC_ATTR_FIRMWARE_NAME. This patch adds appropriate checks. Found with syzkaller. Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-29 08:56:26 -07:00
John Crispin	2ab4587675	mac80211: add support for the ADDBA extension element HE allows peers to negotiate the aggregation fragmentation level to be used during transmission. The level can be 1-3. The Ext element is added behind the ADDBA request inside the action frame. The responder will then reply with the same level or a lower one if the requested one is not supported. This patch only handles the negotiation part as the ADDBA frames get passed to the ATH11k firmware, which does the rest of the magic for us aswell as generating the requests. Signed-off-by: Shashidhar Lakkavalli <slakkavalli@datto.com> Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20190729104512.27615-1-john@phrozen.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-29 16:40:22 +02:00
John Crispin	697f6c507c	mac80211: propagate HE operation info into bss_conf Upon a successful assoc a station shall store the content of the HE operation element inside bss_conf so that the driver can setup the hardware accordingly. Signed-off-by: Shashidhar Lakkavalli <slakkavalli@datto.com> Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20190729102342.8659-2-john@phrozen.org [use struct copy] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-29 16:39:37 +02:00
Michael Vassernis	d34990bbc2	cfg80211: fix dfs channels remain DFS_AVAILABLE after ch_switch Depending on the regulatory domain, leaving a DFS channel requires a new CAC to be performed when returning back to that channel. If needed, update dfs states after a driver channel switch. Signed-off-by: Michael Vassernis <michael.vassernis@tandemg.com> Link: https://lore.kernel.org/r/20190729060024.5660-1-michael.vassernis@tandemg.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-29 16:34:41 +02:00
Sergey Matyukevich	0afd425b1b	cfg80211: fix duplicated scan entries after channel switch When associated BSS completes channel switch procedure, its channel record needs to be updated. The existing mac80211 solution was extended to cfg80211 in commit `5dc8cdce1d` ("mac80211/cfg80211: update bss channel on channel switch"). However that solution still appears to be incomplete as it may lead to duplicated scan entries for associated BSS after channel switch. The root cause of the problem is as follows. Each BSS entry is included into the following data structures: - bss list rdev->bss_list - bss search tree rdev->bss_tree Updating BSS channel record without rebuilding bss_tree may break tree search since cmp_bss considers all of the following: channel, bssid, ssid. When BSS channel is updated, but its location in bss_tree is not updated, then subsequent search operations may fail to locate this BSS since they will be traversing bss_tree in wrong direction. As a result, for scan performed after associated BSS channel switch, cfg80211_bss_update may add the second entry for the same BSS to both bss_list and bss_tree, rather then update the existing one. To summarize, if BSS channel needs to be updated, then bss_tree should be rebuilt in order to put updated BSS entry into a proper location. This commit suggests the following straightforward solution: - if new entry has been already created for BSS after channel switch, then use its IEs to update known BSS entry and then remove new entry completely - use rb_erase/rb_insert_bss reinstall updated BSS in bss_tree - for nontransmit BSS entry, the whole transmit BSS hierarchy is updated Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com> Link: https://lore.kernel.org/r/20190726163922.27509-3-sergey.matyukevich.os@quantenna.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-29 16:23:13 +02:00
Sergey Matyukevich	3ab8227d3e	cfg80211: refactor cfg80211_bss_update This patch implements minor refactoring for cfg80211_bss_update function. Code path for updating known BSS is extracted into dedicated cfg80211_update_known_bss function. Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com> Link: https://lore.kernel.org/r/20190726163922.27509-2-sergey.matyukevich.os@quantenna.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-29 16:23:13 +02:00
Brian Norris	05aaa5c97d	mac80211: don't WARN on short WMM parameters from AP In a very similar spirit to commit `c470bdc1aa` ("mac80211: don't WARN on bad WMM parameters from buggy APs"), an AP may not transmit a fully-formed WMM IE. For example, it may miss or repeat an Access Category. The above loop won't catch that and will instead leave one of the four ACs zeroed out. This triggers the following warning in drv_conf_tx() wlan0: invalid CW_min/CW_max: 0/0 and it may leave one of the hardware queues unconfigured. If we detect such a case, let's just print a warning and fall back to the defaults. Tested with a hacked version of hostapd, intentionally corrupting the IEs in hostapd_eid_wmm(). Cc: stable@vger.kernel.org Signed-off-by: Brian Norris <briannorris@chromium.org> Link: https://lore.kernel.org/r/20190726224758.210953-1-briannorris@chromium.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-29 16:19:44 +02:00
Jonathan Lemon	280b0b8e89	ipv6: remove printk ipv6_find_hdr() prints a non-rate limited error message when it cannot find an ipv6 header at a specific offset. This could be used as a DoS, so just remove it. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-27 14:23:48 -07:00
Jia-Ju Bai	c7ba50fe23	net: rds: Fix possible null-pointer dereferences in rds_rdma_cm_event_handler_cmn() In rds_rdma_cm_event_handler_cmn(), there are some if statements to check whether conn is NULL, such as on lines 65, 96 and 112. But conn is not checked before being used on line 108: trans->cm_connect_complete(conn, event); and on lines 140-143: rdsdebug("DISCONNECT event - dropping connection " "%pI6c->%pI6c\n", &conn->c_laddr, &conn->c_faddr); rds_conn_drop(conn); Thus, possible null-pointer dereferences may occur. To fix these bugs, conn is checked before being used. These bugs are found by a static analysis tool STCheck written by us. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-27 13:58:12 -07:00
Colin Ian King	f530eed65b	net: neigh: remove redundant assignment to variable bucket The variable bucket is being initialized with a value that is never read and it is being updated later with a new value in a following for-loop. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-27 13:32:06 -07:00
Haishuang Yan	01f5bffad5	ip6_tunnel: fix possible use-after-free on xmit ip4ip6/ip6ip6 tunnels run iptunnel_handle_offloads on xmit which can cause a possible use-after-free accessing iph/ipv6h pointer since the packet will be 'uncloned' running pskb_expand_head if it is a cloned gso skb. Fixes: `0e9a709560` ("ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets") Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-26 14:17:44 -07:00
Denis Kenzior	ae6fa4d5e9	nl80211: Include wiphy address setup in NEW_WIPHY Include wiphy address setup in wiphy dumps and new wiphy events. The wiphy permanent address is exposed as ATTR_MAC. If addr_mask is setup, then it is included as ATTR_MAC_MASK attribute. If multiple addresses are available, then their are exposed in a nested ATTR_MAC_ADDRS array. This information is already exposed via sysfs, but it makes sense to include it in the wiphy dump as well. Signed-off-by: Denis Kenzior <denkenz@gmail.com> Link: https://lore.kernel.org/r/20190722113312.14031-3-denkenz@gmail.com [use just nla_nest_start(), this is new functionality] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-26 16:14:13 +02:00
Sven Eckelmann	60ad72da55	mac80211: implement HE support for mesh Implement the basics required for supporting high efficiency with mesh: include HE information elements in beacons, probe responses, and peering action frames, and check for compatible HE configurations when peering. Signed-off-by: Sven Eckelmann <seckelmann@datto.com> Forwarded: https://patchwork.kernel.org/patch/11029299/ Link: https://lore.kernel.org/r/20190724163359.3507-2-sven@narfation.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-26 16:14:12 +02:00
Lorenzo Bianconi	a0b4496a43	mac80211: add IEEE80211_KEY_FLAG_GENERATE_MMIE to ieee80211_key_flags Add IEEE80211_KEY_FLAG_GENERATE_MMIE flag to ieee80211_key_flags in order to allow the driver to notify mac80211 to generate MMIE and that it requires sequence number generation only. This is a preliminary patch to add BIP_CMAC_128 hw support to mt7615 driver Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/dfe275f9aa0f1cc6b33085f9efd5d8447f68ad13.1563228405.git.lorenzo@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-26 16:14:12 +02:00
John Crispin	ef11a931bd	mac80211: HE: add Spatial Reuse element parsing support Add support to mac80211 for parsing SPR elements as per P802.11ax_D4.0 section 9.4.2.241. Signed-off-by: Shashidhar Lakkavalli <slakkavalli@datto.com> Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20190618061915.7102-2-john@phrozen.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-26 16:14:12 +02:00
John Crispin	3d07ffcaf3	mac80211: add struct ieee80211_tx_status support to ieee80211_add_tx_radiotap_header Add support to ieee80211_add_tx_radiotap_header() for handling rates reported via ieee80211_tx_status. This allows us to also report HE rates. Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20190714154419.11854-4-john@phrozen.org [remove text about 60 GHz, mac80211 doesn't support it, fix endianness issue] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-26 16:14:06 +02:00
Manikanta Pubbisetty	e6f4051123	{nl,mac}80211: fix interface combinations on crypto controlled devices Commit `33d915d9e8` ("{nl,mac}80211: allow 4addr AP operation on crypto controlled devices") has introduced a change which allows 4addr operation on crypto controlled devices (ex: ath10k). This change has inadvertently impacted the interface combinations logic on such devices. General rule is that software interfaces like AP/VLAN should not be listed under supported interface combinations and should not be considered during validation of these combinations; because of the aforementioned change, AP/VLAN interfaces(if present) will be checked against interfaces supported by the device and blocks valid interface combinations. Consider a case where an AP and AP/VLAN are up and running; when a second AP device is brought up on the same physical device, this AP will be checked against the AP/VLAN interface (which will not be part of supported interface combinations of the device) and blocks second AP to come up. Add a new API cfg80211_iftype_allowed() to fix the problem, this API works for all devices with/without SW crypto control. Signed-off-by: Manikanta Pubbisetty <mpubbise@codeaurora.org> Fixes: `33d915d9e8` ("{nl,mac}80211: allow 4addr AP operation on crypto controlled devices") Link: https://lore.kernel.org/r/1563779690-9716-1-git-send-email-mpubbise@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-26 13:50:43 +02:00
John Crispin	b7b2e8caa0	mac80211: propagate struct ieee80211_tx_status into ieee80211_tx_monitor() This will allow use to report HE rates on the radiotap interface. Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20190714154419.11854-3-john@phrozen.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-26 13:41:45 +02:00
John Crispin	2aa485e114	mac80211: add support for parsing ADDBA_EXT IEs ADDBA_EXT IEs can be used to negotiate the BA fragmentation level. Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20190713163642.18491-2-john@phrozen.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-26 13:32:07 +02:00
Johannes Berg	60d7dfea00	cfg80211: give all multi-BSSID BSS entries the same timestamp If we just read jiffies over and over again, a non-transmitting entry may have a newer timestamp than the transmitting one, leading to possible confusion on expiry. Give them all the same timestamp when creating them. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com> Link: https://lore.kernel.org/r/20190703133823.10530-3-johannes@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-26 13:31:46 +02:00
Johannes Berg	b0d1d7ffc5	cfg80211: don't parse MBSSID if transmitting BSS isn't created Don't parse the multi-BSSID structures if we couldn't even create their transmitting BSS, this would confuse all of our tracking. This also means that non_tx_data->tx_bss will never be NULL, so we can clean up a little bit. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com> Link: https://lore.kernel.org/r/20190703133823.10530-2-johannes@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-26 13:31:30 +02:00
Johannes Berg	84f1772bc0	cfg80211: clean up cfg80211_inform_single_bss_frame_data() cfg80211_inform_single_bss_frame_data() doesn't need the non_tx_data data argument since it's always NULL. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com> Link: https://lore.kernel.org/r/20190703133823.10530-1-johannes@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-26 13:31:00 +02:00
Greg Kroah-Hartman	d82574a8e5	cfg80211: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20190703070142.GA29993@kroah.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-26 13:30:43 +02:00
Alexander Wetzel	dc3998ec5c	mac80211: AMPDU handling for rekeys with Extended Key ID Extended Key ID allows A-MPDU sessions while rekeying as long as each A-MPDU aggregates only MPDUs with one keyid together. Drivers able to segregate MPDUs accordingly can tell mac80211 to not stop A-MPDU sessions when rekeying by setting the new flag IEEE80211_HW_AMPDU_KEYBORDER_SUPPORT. Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> Link: https://lore.kernel.org/r/20190629195015.19680-3-alexander@wetzel-home.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-26 13:29:10 +02:00
Alexander Wetzel	3e47bf1ca4	mac80211: Simplify Extended Key ID API 1) Drop IEEE80211_HW_EXT_KEY_ID_NATIVE and let drivers directly set the NL80211_EXT_FEATURE_EXT_KEY_ID flag. 2) Drop IEEE80211_HW_NO_AMPDU_KEYBORDER_SUPPORT and simply assume all drivers are unable to handle A-MPDU key borders. The new Extended Key ID API now requires all mac80211 drivers to set NL80211_EXT_FEATURE_EXT_KEY_ID when they implement set_key() and can handle Extended Key ID. For drivers not providing set_key() mac80211 itself enables Extended Key ID support, using the internal SW crypto services. Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> Link: https://lore.kernel.org/r/20190629195015.19680-2-alexander@wetzel-home.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-26 13:28:59 +02:00
Erik Stromdahl	fb0e76abe3	mac80211: add tx dequeue function for process context Since ieee80211_tx_dequeue() must not be called with softirqs enabled (i.e. from process context without proper disable of bottom halves), we add a wrapper that disables bottom halves before calling ieee80211_tx_dequeue() The new function is named ieee80211_tx_dequeue_ni() just as all other from-process-context versions found in mac80211. The documentation of ieee80211_tx_dequeue() is also updated so it mentions that the function should not be called from process context. Signed-off-by: Erik Stromdahl <erik.stromdahl@gmail.com> Link: https://lore.kernel.org/r/20190617200140.6189-1-erik.stromdahl@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-26 13:23:19 +02:00
Ard Biesheuvel	a11e2f8548	lib80211: use crypto API ccm(aes) transform for CCMP processing Instead of open coding the CCM aead mode in the driver, and invoking the AES block cipher block by block, use a ccm(aes) aead transform which already encapsulates this functionality. This is a cleaner use of the crypto API, and permits optimized implementations to be used, which are typically much faster and deal more efficiently with the SIMD register file, which usually needs to be preserved/restored in order to use special AES instructions. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Link: https://lore.kernel.org/r/20190617091901.7063-1-ard.biesheuvel@linaro.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-26 13:22:47 +02:00
Greg Kroah-Hartman	612fcfd9b3	mac80211: remove unused and unneeded remove_sta_debugfs callback The remove_sta_debugfs callback in struct rate_control_ops is no longer used by any driver, as there is no need for it (the debugfs directory is already removed recursivly by the mac80211 core.) Because no one needs it, just remove it to keep anyone else from accidentally using it in the future. Cc: Johannes Berg <johannes@sipsolutions.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20190612142658.12792-5-gregkh@linuxfoundation.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-26 13:21:12 +02:00
Emmanuel Grumbach	5db4c4b955	mac80211: pass the vif to cancel_remain_on_channel This low level driver can find it useful to get the vif when a remain on channel session is cancelled. iwlwifi will need this soon. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Link: https://lore.kernel.org/r/20190723180001.5828-1-emmanuel.grumbach@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-07-26 13:08:28 +02:00
Stanislav Fomichev	71c99e32b9	bpf/flow_dissector: support ipv6 flow_label and BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL Add support for exporting ipv6 flow label via bpf_flow_keys. Export flow label from bpf_flow.c and also return early when BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL is passed. Acked-by: Petar Penkov <ppenkov@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Petar Penkov <ppenkov@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-07-25 18:00:41 -07:00
Stanislav Fomichev	b2ca4e1cfa	bpf/flow_dissector: support flags in BPF_PROG_TEST_RUN This will allow us to write tests for those flags. v2: * Swap kfree(data) and kfree(user_ctx) (Song Liu) Acked-by: Petar Penkov <ppenkov@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Petar Penkov <ppenkov@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-07-25 18:00:41 -07:00
Stanislav Fomichev	086f956821	bpf/flow_dissector: pass input flags to BPF flow dissector program C flow dissector supports input flags that tell it to customize parsing by either stopping early or trying to parse as deep as possible. Pass those flags to the BPF flow dissector so it can make the same decisions. In the next commits I'll add support for those flags to our reference bpf_flow.c v3: * Export copy of flow dissector flags instead of moving (Alexei Starovoitov) Acked-by: Petar Penkov <ppenkov@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Petar Penkov <ppenkov@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-07-25 18:00:41 -07:00
Allan Zhang	7c4b90d79d	bpf: Allow bpf_skb_event_output for a few prog types Software event output is only enabled by a few prog types right now (TC, LWT out, XDP, sockops). Many other skb based prog types need bpf_skb_event_output to produce software event. Added socket_filter, cg_skb, sk_skb prog types to generate sw event. Test bpf code is generated from code snippet: struct TMP { uint64_t tmp; } tt; tt.tmp = 5; bpf_perf_event_output(skb, &connection_tracking_event_map, 0, &tt, sizeof(tt)); return 1; the bpf assembly from llvm is: 0: b7 02 00 00 05 00 00 00 r2 = 5 1: 7b 2a f8 ff 00 00 00 00 (u64 )(r10 - 8) = r2 2: bf a4 00 00 00 00 00 00 r4 = r10 3: 07 04 00 00 f8 ff ff ff r4 += -8 4: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0ll 6: b7 03 00 00 00 00 00 00 r3 = 0 7: b7 05 00 00 08 00 00 00 r5 = 8 8: 85 00 00 00 19 00 00 00 call 25 9: b7 00 00 00 01 00 00 00 r0 = 1 10: 95 00 00 00 00 00 00 00 exit Signed-off-by: Allan Zhang <allanzhang@google.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-07-25 17:56:00 -07:00
David S. Miller	28ba934d28	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Alexei Starovoitov says: ==================== pull-request: bpf 2019-07-25 The following pull-request contains BPF updates for your net tree. The main changes are: 1) fix segfault in libbpf, from Andrii. 2) fix gso_segs access, from Eric. 3) tls/sockmap fixes, from Jakub and John. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-25 17:35:03 -07:00
Haishuang Yan	47d858d0bd	ipip: validate header length in ipip_tunnel_xmit We need the same checks introduced by commit `cb9f1b7838` ("ip: validate header length on virtual device xmit") for ipip tunnel. Fixes: `cb9f1b7838` ("ip: validate header length on virtual device xmit") Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-25 17:23:40 -07:00
Tuong Lien	2320bcdae6	tipc: fix changeover issues due to large packet In conjunction with changing the interfaces' MTU (e.g. especially in the case of a bonding) where the TIPC links are brought up and down in a short time, a couple of issues were detected with the current link changeover mechanism: 1) When one link is up but immediately forced down again, the failover procedure will be carried out in order to failover all the messages in the link's transmq queue onto the other working link. The link and node state is also set to FAILINGOVER as part of the process. The message will be transmited in form of a FAILOVER_MSG, so its size is plus of 40 bytes (= the message header size). There is no problem if the original message size is not larger than the link's MTU - 40, and indeed this is the max size of a normal payload messages. However, in the situation above, because the link has just been up, the messages in the link's transmq are almost SYNCH_MSGs which had been generated by the link synching procedure, then their size might reach the max value already! When the FAILOVER_MSG is built on the top of such a SYNCH_MSG, its size will exceed the link's MTU. As a result, the messages are dropped silently and the failover procedure will never end up, the link will not be able to exit the FAILINGOVER state, so cannot be re-established. 2) The same scenario above can happen more easily in case the MTU of the links is set differently or when changing. In that case, as long as a large message in the failure link's transmq queue was built and fragmented with its link's MTU > the other link's one, the issue will happen (there is no need of a link synching in advance). 3) The link synching procedure also faces with the same issue but since the link synching is only started upon receipt of a SYNCH_MSG, dropping the message will not result in a state deadlock, but it is not expected as design. The 1) & 3) issues are resolved by the last commit that only a dummy SYNCH_MSG (i.e. without data) is generated at the link synching, so the size of a FAILOVER_MSG if any then will never exceed the link's MTU. For the 2) issue, the only solution is trying to fragment the messages in the failure link's transmq queue according to the working link's MTU so they can be failovered then. A new function is made to accomplish this, it will still be a TUNNEL PROTOCOL/FAILOVER MSG but if the original message size is too large, it will be fragmented & reassembled at the receiving side. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-25 15:55:47 -07:00
Tuong Lien	4929a932be	tipc: optimize link synching mechanism This commit along with the next one are to resolve the issues with the link changeover mechanism. See that commit for details. Basically, for the link synching, from now on, we will send only one single ("dummy") SYNCH message to peer. The SYNCH message does not contain any data, just a header conveying the synch point to the peer. A new node capability flag ("TIPC_TUNNEL_ENHANCED") is introduced for backward compatible! Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Suggested-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-25 15:55:47 -07:00
Cong Wang	c8ec4632c6	ife: error out when nla attributes are empty act_ife at least requires TCA_IFE_PARMS, so we have to bail out when there is no attribute passed in. Reported-by: syzbot+fbb5b288c9cb6a2eeac4@syzkaller.appspotmail.com Fixes: `ef6980b6be` ("introduce IFE action") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-25 11:21:39 -07:00
Phil Sutter	67d8683584	netfilter: nft_meta_bridge: Eliminate 'out' label The label is used just once and the code it points at is not reused, no point in keeping it. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-07-25 08:38:29 +02:00
Phil Sutter	cb81572e8c	netfilter: nf_tables: Make nft_meta expression more robust nft_meta_get_eval()'s tendency to bail out setting NFT_BREAK verdict in situations where required data is missing leads to unexpected behaviour with inverted checks like so: \| meta iifname != eth0 accept This rule will never match if there is no input interface (or it is not known) which is not intuitive and, what's worse, breaks consistency of iptables-nft with iptables-legacy. Fix this by falling back to placing a value in dreg which never matches (avoiding accidental matches), i.e. zero for interface index and an empty string for interface name. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-07-25 08:37:20 +02:00
Cong Wang	4638faac03	netrom: hold sock when setting skb->destructor sock_efree() releases the sock refcnt, if we don't hold this refcnt when setting skb->destructor to it, the refcnt would not be balanced. This leads to several bug reports from syzbot. I have checked other users of sock_efree(), all of them hold the sock refcnt. Fixes: `c8c8218ec5` ("netrom: fix a memory leak in nr_rx_frame()") Reported-and-tested-by: <syzbot+622bdabb128acc33427d@syzkaller.appspotmail.com> Reported-and-tested-by: <syzbot+6eaef7158b19e3fec3a0@syzkaller.appspotmail.com> Reported-and-tested-by: <syzbot+9399c158fcc09b21d0d2@syzkaller.appspotmail.com> Reported-and-tested-by: <syzbot+a34e5f3d0300163f0c87@syzkaller.appspotmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-24 15:49:05 -07:00
Arnd Bergmann	260637903f	ovs: datapath: hide clang frame-overflow warnings Some functions in the datapath code are factored out so that each one has a stack frame smaller than 1024 bytes with gcc. However, when compiling with clang, the functions are inlined more aggressively and combined again so we get net/openvswitch/datapath.c:1124:12: error: stack frame size of 1528 bytes in function 'ovs_flow_cmd_set' [-Werror,-Wframe-larger-than=] Marking both get_flow_actions() and ovs_nla_init_match_and_action() as 'noinline_for_stack' gives us the same behavior that we see with gcc, and no warning. Note that this does not mean we actually use less stack, as the functions call each other, and we still get three copies of the large 'struct sw_flow_key' type on the stack. The comment tells us that this was previously considered safe, presumably since the netlink parsing functions are called with a known backchain that does not also use a lot of stack space. Fixes: `9cc9a5cb17` ("datapath: Avoid using stack larger than 1024.") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-24 15:45:11 -07:00
David S. Miller	09ea26792a	linux-can-fixes-for-5.3-20190724 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEmvEkXzgOfc881GuFWsYho5HknSAFAl04JHsTHG1rbEBwZW5n dXRyb25peC5kZQAKCRBaxiGjkeSdIB52B/9S17IKTEpoNwM5KvrR+ZMqWZTx6zng iKNef14uuAemHzThMoSAJoJBGGzcgmZeu/nd0fyGg0f51kXVkapU4TsvYa2azkt9 WcTZQG3N4uUuW3D8hEbY2+au8e8ubNR+NnWQEqeoa5yuDGg7nSjD3wYd7rcY6jF5 veSGoZFQ9+MpjuYBHqtwiAauPAUqj8pHPAawGsaT/eNvsTHa7y43Iy38VbS2XTrF Ds6mwp8unE0cuBUFD49YLs3JrYsbuQRf5q2kX3/qvIO9QtJ/GQ6Dt/Sy3iC7Eo9Y PAVLK90sNb+LlRBKhfCAKUdgZsxdJ/kv6RfVDOiCoQDMVl/Ukv9UWB9Y =1E76 -----END PGP SIGNATURE----- Merge tag 'linux-can-fixes-for-5.3-20190724' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2019-07-24 this is a pull reqeust of 7 patches for net/master. The first patch is by Rasmus Villemoes add a missing netif_carrier_off() to register_candev() so that generic netdev trigger based LEDs are initially off. Nikita Yushchenko's patch for the rcar_canfd driver fixes a possible IRQ storm on high load. The patch by Weitao Hou for the mcp251x driver add missing error checking to the work queue allocation. Both Wen Yang's and Joakim Zhang's patch for the flexcan driver fix a problem with the stop-mode. Stephane Grosjean contributes a patch for the peak_usb driver to fix a potential double kfree_skb(). The last patch is by YueHaibing and fixes the error path in can-gw's cgw_module_init() function. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-24 14:14:50 -07:00
Haishuang Yan	3bc817d665	ip6_gre: reload ipv6h in prepare_ip6gre_xmit_ipv6 Since ip6_tnl_parse_tlv_enc_lim() can call pskb_may_pull() which may change skb->data, so we need to re-load ipv6h at the right place. Fixes: `898b29798e` ("ip6_gre: Refactor ip6gre xmit codes") Cc: William Tu <u9012063@gmail.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-24 13:59:25 -07:00
Pavel Machek	c7148c03db	net/ipv4: cleanup error condition testing Cleanup testing for error condition. Signed-off-by: Pavel Machek <pavel@denx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-24 13:56:37 -07:00
YueHaibing	b7a14297f1	can: gw: Fix error path of cgw_module_init This patch add error path for cgw_module_init to avoid possible crash if some error occurs. Fixes: `c1aabdf379` ("can-gw: add netlink based CAN routing") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-07-24 11:19:03 +02:00
Oliver Hartkopp	fba76a5845	can: Add SPDX license identifiers for CAN subsystem Add missing SPDX identifiers for the CAN network layer and correct the SPDX license for two of its include files to make sure the BSD-3-Clause applies for the entire subsystem. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-07-24 10:31:55 +02:00

1 2 3 4 5 ...

57339 Commits