OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Vinicius Costa Gomes	9c66d15646	taprio: Add support for hardware offloading This allows taprio to offload the schedule enforcement to capable network cards, resulting in more precise windows and less CPU usage. The gate mask acts on traffic classes (groups of queues of same priority), as specified in IEEE 802.1Q-2018, and following the existing taprio and mqprio semantics. It is up to the driver to perform conversion between tc and individual netdev queues if for some reason it needs to make that distinction. Full offload is requested from the network interface by specifying "flags 2" in the tc qdisc creation command, which in turn corresponds to the TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD bit. The important detail here is the clockid which is implicitly /dev/ptpN for full offload, and hence not configurable. A reference counting API is added to support the use case where Ethernet drivers need to keep the taprio offload structure locally (i.e. they are a multi-port switch driver, and configuring a port depends on the settings of other ports as well). The refcount_t variable is kept in a private structure (__tc_taprio_qopt_offload) and not exposed to drivers. In the future, the private structure might also be expanded with a backpointer to taprio_sched *q, to implement the notification system described in the patch (of when admin became oper, or an error occurred, etc, so the offload can be monitored with 'tc qdisc show'). Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-16 21:32:57 +02:00
Thomas Higdon	8f7baad7f0	tcp: Add snd_wnd to TCP_INFO Neal Cardwell mentioned that snd_wnd would be useful for diagnosing TCP performance problems -- > (1) Usually when we're diagnosing TCP performance problems, we do so > from the sender, since the sender makes most of the > performance-critical decisions (cwnd, pacing, TSO size, TSQ, etc). > From the sender-side the thing that would be most useful is to see > tp->snd_wnd, the receive window that the receiver has advertised to > the sender. This serves the purpose of adding an additional __u32 to avoid the would-be hole caused by the addition of the tcpi_rcvi_ooopack field. Signed-off-by: Thomas Higdon <tph@fb.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-16 16:26:11 +02:00
Thomas Higdon	f9af2dbbfe	tcp: Add TCP_INFO counter for packets received out-of-order For receive-heavy cases on the server-side, we want to track the connection quality for individual client IPs. This counter, similar to the existing system-wide TCPOFOQueue counter in /proc/net/netstat, tracks out-of-order packet reception. By providing this counter in TCP_INFO, it will allow understanding to what degree receive-heavy sockets are experiencing out-of-order delivery and packet drops indicating congestion. Please note that this is similar to the counter in NetBSD TCP_INFO, and has the same name. Also note that we avoid increasing the size of the tcp_sock struct by taking advantage of a hole. Signed-off-by: Thomas Higdon <tph@fb.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-16 16:26:11 +02:00
David S. Miller	28f2c362db	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2019-09-16 The following pull-request contains BPF updates for your net-next tree. The main changes are: 1) Now that initial BPF backend for gcc has been merged upstream, enable BPF kselftest suite for bpf-gcc. Also fix a BE issue with access to bpf_sysctl.file_pos, from Ilya. 2) Follow-up fix for link-vmlinux.sh to remove bash-specific extensions related to recent work on exposing BTF info through sysfs, from Andrii. 3) AF_XDP zero copy fixes for i40e and ixgbe driver which caused umem headroom to be added twice, from Ciara. 4) Refactoring work to convert sock opt tests into test_progs framework in BPF kselftests, from Stanislav. 5) Fix a general protection fault in dev_map_hash_update_elem(), from Toke. 6) Cleanup to use BPF_PROG_RUN() macro in KCM, from Sami. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-16 16:02:03 +02:00
Ilya Dryomov	cf73d882cc	libceph: use ceph_kvmalloc() for osdmap arrays osdmap has a bunch of arrays that grow linearly with the number of OSDs. osd_state, osd_weight and osd_primary_affinity take 4 bytes per OSD. osd_addr takes 136 bytes per OSD because of sockaddr_storage. The CRUSH workspace area also grows linearly with the number of OSDs. Normally these arrays are allocated at client startup. The osdmap is usually updated in small incrementals, but once in a while a full map may need to be processed. For a cluster with 10000 OSDs, this means a bunch of 40K allocations followed by a 1.3M allocation, all of which are currently required to be physically contiguous. This results in sporadic ENOMEM errors, hanging the client. Go back to manually (re)allocating arrays and use ceph_kvmalloc() to fall back to non-contiguous allocation when necessary. Link: https://tracker.ceph.com/issues/40481 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>	2019-09-16 12:06:25 +02:00
Ilya Dryomov	10c12851a0	libceph: avoid a __vmalloc() deadlock in ceph_kvmalloc() The vmalloc allocator doesn't fully respect the specified gfp mask: while the actual pages are allocated as requested, the page table pages are always allocated with GFP_KERNEL. ceph_kvmalloc() may be called with GFP_NOFS and GFP_NOIO (for ceph and rbd respectively), so this may result in a deadlock. There is no real reason for the current PAGE_ALLOC_COSTLY_ORDER logic, it's just something that seemed sensible at the time (ceph_kvmalloc() predates kvmalloc()). kvmalloc() is smarter: in an attempt to reduce long term fragmentation, it first tries to kmalloc non-disruptively. Switch to kvmalloc() and set the respective PF_MEMALLOC_* flag using the scope API to avoid the deadlock. Note that kvmalloc() needs to be passed GFP_KERNEL to enable the fallback. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>	2019-09-16 12:06:25 +02:00
Ilya Dryomov	8edf84ba4d	libceph: drop unused con parameter of calc_target() This bit was omitted from `a561372405` ("libceph: fix PG split vs OSD (re)connect race") to avoid backport conflicts. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-09-16 12:06:25 +02:00
David Disseldorp	4766815b11	libceph: handle OSD op ceph_pagelist_append() errors osd_req_op_cls_init() and osd_req_op_xattr_init() currently propagate ceph_pagelist_alloc() ENOMEM errors but ignore ceph_pagelist_append() memory allocation failures. Add these checks and cleanup on error. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-09-16 12:06:25 +02:00
Yan, Zheng	2cef0ba803	libceph: add function that clears osd client's abort_err Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-09-16 12:06:23 +02:00
Yan, Zheng	120a75ea9f	libceph: add function that reset client's entity addr This function also re-open connections to OSD/MON, and re-send in-flight OSD requests after re-opening connections to OSD. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-09-16 12:06:23 +02:00
Vlad Buslov	470d5060e6	net: sched: use get_dev() action API in flow_action infra When filling in hardware intermediate representation tc_setup_flow_action() directly obtains, checks and takes reference to dev used by mirred action, instead of using act->ops->get_dev() API created specifically for this purpose. In order to remove code duplication, refactor flow_action infra to use action API when obtaining mirred action target dev. Extend get_dev() with additional argument that is used to provide dev destructor to the user. Fixes: `5a6ff4b13d` ("net: sched: take reference to action dev before calling offloads") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-16 09:18:03 +02:00
Vlad Buslov	4a5da47d5c	net: sched: take reference to psample group in flow_action infra With recent patch set that removed rtnl lock dependency from cls hardware offload API rtnl lock is only taken when reading action data and can be released after action-specific data is parsed into intermediate representation. However, sample action psample group is passed by pointer without obtaining reference to it first, which makes it possible to concurrently overwrite the action and deallocate object pointed by psample_group pointer after rtnl lock is released but before driver finished using the pointer. To prevent such race condition, obtain reference to psample group while it is used by flow_action infra. Extend psample API with function psample_group_take() that increments psample group reference counter. Extend struct tc_action_ops with new get_psample_group() API. Implement the API for action sample using psample_group_take() and already existing psample_group_put() as a destructor. Use it in tc_setup_flow_action() to take reference to psample group pointed to by entry->sample.psample_group and release it in tc_cleanup_flow_action(). Disable bh when taking psample_groups_lock. The lock is now taken while holding action tcf_lock that is used by data path and requires bh to be disabled, so doing the same for psample_groups_lock is necessary to preserve SOFTIRQ-irq-safety. Fixes: `918190f50e` ("net: sched: flower: don't take rtnl lock for cls hw offloads API") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-16 09:18:03 +02:00
Vlad Buslov	1158958a21	net: sched: extend flow_action_entry with destructor Generalize flow_action_entry cleanup by extending the structure with pointer to destructor function. Set the destructor in tc_setup_flow_action(). Refactor tc_cleanup_flow_action() to call entry->destructor() instead of using switch that dispatches by entry->id and manually executes cleanup. This refactoring is necessary for following patches in this series that require destructor to use tc_action->ops callbacks that can't be easily obtained in tc_cleanup_flow_action(). Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-16 09:18:02 +02:00
Xin Long	28e4860377	ip6_gre: fix a dst leak in ip6erspan_tunnel_xmit In ip6erspan_tunnel_xmit(), if the skb will not be sent out, it has to be freed on the tx_err path. Otherwise when deleting a netns, it would cause dst/dev to leak, and dmesg shows: unregister_netdevice: waiting for lo to become free. Usage count = 1 Fixes: `ef7baf5e08` ("ip6_gre: add ip6 erspan collect_md mode") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-16 09:08:18 +02:00
Willem de Bruijn	acdcecc612	udp: correct reuseport selection with connected sockets UDP reuseport groups can hold a mix unconnected and connected sockets. Ensure that connections only receive all traffic to their 4-tuple. Fast reuseport returns on the first reuseport match on the assumption that all matches are equal. Only if connections are present, return to the previous behavior of scoring all sockets. Record if connections are present and if so (1) treat such connected sockets as an independent match from the group, (2) only return 2-tuple matches from reuseport and (3) do not return on the first 2-tuple reuseport match to allow for a higher scoring match later. New field has_conns is set without locks. No other fields in the bitmap are modified at runtime and the field is only ever set unconditionally, so an RMW cannot miss a change. Fixes: `e32ea7e747` ("soreuseport: fast reuseport UDP socket selection") Link: http://lkml.kernel.org/r/CA+FuTSfRP09aJNYRt04SS6qj22ViiOEWaWmLAwX0psk8-PGNxw@mail.gmail.com Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Craig Gallek <kraig@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-16 09:02:18 +02:00
Gerd Rausch	05a82481a3	net/rds: Fix 'ib_evt_handler_call' element in 'rds_ib_stat_names' All entries in 'rds_ib_stat_names' are stringified versions of the corresponding "struct rds_ib_statistics" element without the "s_"-prefix. Fix entry 'ib_evt_handler_call' to do the same. Fixes: `f4f943c958` ("RDS: IB: ack more receive completions to improve performance") Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-15 20:56:19 +02:00
Cong Wang	6efb971ba8	net_sched: let qdisc_put() accept NULL pointer When tcf_block_get() fails in sfb_init(), q->qdisc is still a NULL pointer which leads to a crash in sfb_destroy(). Similar for sch_dsmark. Instead of fixing each separately, Linus suggested to just accept NULL pointer in qdisc_put(), which would make callers easier. (For sch_dsmark, the bug probably exists long before commit 6529eaba33f0.) Fixes: `6529eaba33` ("net: sched: introduce tcf block infractructure") Reported-by: syzbot+d5870a903591faaca4ae@syzkaller.appspotmail.com Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-15 20:54:14 +02:00
Andrew Lunn	23426a25e5	net: dsa: Fix load order between DSA drivers and taggers The DSA core, DSA taggers and DSA drivers all make use of module_init(). Hence they get initialised at device_initcall() time. The ordering is non-deterministic. It can be a DSA driver is bound to a device before the needed tag driver has been initialised, resulting in the message: No tagger for this switch Rather than have this be fatal, return -EPROBE_DEFER so that it is tried again later once all the needed drivers have been loaded. Fixes: `d3b8c04988` ("dsa: Add boilerplate helper to register DSA tag driver modules") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-15 20:49:19 +02:00
Paolo Abeni	d518d2ed86	net/sched: fix race between deactivation and dequeue for NOLOCK qdisc The test implemented by some_qdisc_is_busy() is somewhat loosy for NOLOCK qdisc, as we may hit the following scenario: CPU1 CPU2 // in net_tx_action() clear_bit(__QDISC_STATE_SCHED...); // in some_qdisc_is_busy() val = (qdisc_is_running(q) \|\| test_bit(__QDISC_STATE_SCHED, &q->state)); // here val is 0 but... qdisc_run(q) // ... CPU1 is going to run the qdisc next As a conseguence qdisc_run() in net_tx_action() can race with qdisc_reset() in dev_qdisc_reset(). Such race is not possible for !NOLOCK qdisc as both the above bit operations are under the root qdisc lock(). After commit `021a17ed79` ("pfifo_fast: drop unneeded additional lock on dequeue") the race can cause use after free and/or null ptr dereference, but the root cause is likely older. This patch addresses the issue explicitly checking for deactivation under the seqlock for NOLOCK qdisc, so that the qdisc_run() in the critical scenario becomes a no-op. Note that the enqueue() op can still execute concurrently with dev_qdisc_reset(), but that is safe due to the skb_array() locking, and we can't avoid that for NOLOCK qdiscs. Fixes: `021a17ed79` ("pfifo_fast: drop unneeded additional lock on dequeue") Reported-by: Li Shuang <shuali@redhat.com> Reported-and-tested-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-15 20:35:55 +02:00
David S. Miller	aa2eaa8c27	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Minor overlapping changes in the btusb and ixgbe drivers. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-15 14:17:27 +02:00
Jiri Pirko	2670ac2625	net: devlink: move reload fail indication to devlink core and expose to user Currently the fact that devlink reload failed is stored in drivers. Move this flag into devlink core. Also, expose it to the user. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-13 22:11:14 +02:00
Jiri Pirko	97691069dc	net: devlink: split reload op into two In order to properly implement failure indication during reload, split the reload op into two ops, one for down phase and one for up phase. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-13 22:11:14 +02:00
Mao Wenan	29b99f54a8	sctp: destroy bucket if failed to bind addr There is one memory leak bug report: BUG: memory leak unreferenced object 0xffff8881dc4c5ec0 (size 40): comm "syz-executor.0", pid 5673, jiffies 4298198457 (age 27.578s) hex dump (first 32 bytes): 02 00 00 00 81 88 ff ff 00 00 00 00 00 00 00 00 ................ f8 63 3d c1 81 88 ff ff 00 00 00 00 00 00 00 00 .c=............. backtrace: [<0000000072006339>] sctp_get_port_local+0x2a1/0xa00 [sctp] [<00000000c7b379ec>] sctp_do_bind+0x176/0x2c0 [sctp] [<000000005be274a2>] sctp_bind+0x5a/0x80 [sctp] [<00000000b66b4044>] inet6_bind+0x59/0xd0 [ipv6] [<00000000c68c7f42>] __sys_bind+0x120/0x1f0 net/socket.c:1647 [<000000004513635b>] __do_sys_bind net/socket.c:1658 [inline] [<000000004513635b>] __se_sys_bind net/socket.c:1656 [inline] [<000000004513635b>] __x64_sys_bind+0x3e/0x50 net/socket.c:1656 [<0000000061f2501e>] do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296 [<0000000003d1e05e>] entry_SYSCALL_64_after_hwframe+0x49/0xbe This is because in sctp_do_bind, if sctp_get_port_local is to create hash bucket successfully, and sctp_add_bind_addr failed to bind address, e.g return -ENOMEM, so memory leak found, it needs to destroy allocated bucket. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Mao Wenan <maowenan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-13 22:06:20 +02:00
Mao Wenan	e0e4b8de10	sctp: remove redundant assignment when call sctp_get_port_local There are more parentheses in if clause when call sctp_get_port_local in sctp_do_bind, and redundant assignment to 'ret'. This patch is to do cleanup. Signed-off-by: Mao Wenan <maowenan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-13 22:06:20 +02:00
Mao Wenan	8e2ef6abd4	sctp: change return type of sctp_get_port_local Currently sctp_get_port_local() returns a long which is either 0,1 or a pointer casted to long. It's neither of the callers use the return value since commit `62208f1245` ("net: sctp: simplify sctp_get_port"). Now two callers are sctp_get_port and sctp_do_bind, they actually assumend a casted to an int was the same as a pointer casted to a long, and they don't save the return value just check whether it is zero or non-zero, so it would better change return type from long to int for sctp_get_port_local. Signed-off-by: Mao Wenan <maowenan@huawei.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-13 22:06:20 +02:00
Jason Gunthorpe	75c66515e4	Merge tag 'v5.3-rc8' into rdma.git for-next To resolve dependencies in following patches mlx5_ib.h conflict resolved by keeing both hunks Linux 5.3-rc8 Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-09-13 16:59:51 -03:00
Willem de Bruijn	c6af0c227a	ip: support SO_MARK cmsg Enable setting skb->mark for UDP and RAW sockets using cmsg. This is analogous to existing support for TOS, TTL, txtime, etc. Packet sockets already support this as of commit `c7d39e3263` ("packet: support per-packet fwmark for af_packet sendmsg"). Similar to other fields, implement by 1. initialize the sockcm_cookie.mark from socket option sk_mark 2. optionally overwrite this in ip_cmsg_send/ip6_datagram_send_ctl 3. initialize inet_cork.mark from sockcm_cookie.mark 4. initialize each (usually just one) skb->mark from inet_cork.mark Step 1 is handled in one location for most protocols by ipcm_init_sk as of commit `351782067b` ("ipv4: ipcm_cookie initializers"). Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-13 21:44:19 +02:00
David S. Miller	022c10d6c7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Fix error path of nf_tables_updobj(), from Dan Carpenter. 2) Move large structure away from stack in the nf_tables offload infrastructure, from Arnd Bergmann. 3) Move indirect flow_block logic to nf_tables_offload. 4) Support for synproxy objects, from Fernando Fernandez Mancera. 5) Support for fwd and dup offload. 6) Add __nft_offload_get_chain() helper, this implicitly fixes missing mutex and check for offload flags in the indirect block support, patch from wenxu. 7) Remove rules on device unregistration, from wenxu. This includes two preparation patches to reuse nft_flow_offload_chain() and nft_flow_offload_rule(). Large batch from Jeremy Sowden to make a second pass to the CONFIG_HEADER_TEST support and a bit of housekeeping: 8) Missing include guard in conntrack label header, from Jeremy Sowden. 9) A few coding style errors: trailing whitespace, incorrect indent in Kconfig, and semicolons at the end of function definitions. 10) Remove unused ipt_init() and ip6t_init() declarations. 11) Inline xt_hashlimit, ebt_802_3 and xt_physdev headers. They are only used once. 12) Update include directive in several netfilter files. 13) Remove unused include/net/netfilter/ipv6/nf_conntrack_icmpv6.h. 14) Move nf_ip6_ext_hdr() to include/linux/netfilter_ipv6.h 15) Move several synproxy structure definitions to nf_synproxy.h 16) Move nf_bridge_frag_data structure to include/linux/netfilter_bridge.h 17) Clean up static inline definitions in nf_conntrack_ecache.h. 18) Replace defined(CONFIG...) \|\| defined(CONFIG...MODULE) with IS_ENABLED(CONFIG...). 19) Missing inline function conditional definitions based on Kconfig preferences in synproxy and nf_conntrack_timeout. 20) Update br_nf_pre_routing_ipv6() definition. 21) Move conntrack code in linux/skbuff.h to nf_conntrack headers. 22) Several patches to remove superfluous CONFIG_NETFILTER and CONFIG_NF_CONNTRACK checks in headers, coming from the initial batch support for CONFIG_HEADER_TEST for netfilter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-13 14:22:43 +01:00
Jeremy Sowden	261db6c2fb	netfilter: conntrack: move code to linux/nf_conntrack_common.h. Move some `struct nf_conntrack` code from linux/skbuff.h to linux/nf_conntrack_common.h. Together with a couple of helpers for getting and setting skb->_nfct, it allows us to remove CONFIG_NF_CONNTRACK checks from net/netfilter/nf_conntrack.h. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-13 12:47:11 +02:00
Jeremy Sowden	46705b070c	netfilter: move nf_bridge_frag_data struct definition to a more appropriate header. There is a struct definition function in nf_conntrack_bridge.h which is not specific to conntrack and is used elswhere in netfilter. Move it into netfilter_bridge.h. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-13 12:35:33 +02:00
Jeremy Sowden	44dde23698	netfilter: move inline nf_ip6_ext_hdr() function to a more appropriate header. There is an inline function in ip6_tables.h which is not specific to ip6tables and is used elswhere in netfilter. Move it into netfilter_ipv6.h and update the callers. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-13 12:34:09 +02:00
Jeremy Sowden	8bf3cbe32b	netfilter: remove nf_conntrack_icmpv6.h header. nf_conntrack_icmpv6.h contains two object macros which duplicate macros in linux/icmpv6.h. The latter definitions are also visible wherever it is included, so remove it. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-13 12:33:06 +02:00
Jeremy Sowden	40d102cde0	netfilter: update include directives. Include some headers in files which require them, and remove others which are not required. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-13 12:33:06 +02:00
Jeremy Sowden	85cfbc25e5	netfilter: inline xt_hashlimit, ebt_802_3 and xt_physdev headers Three netfilter headers are only included once. Inline their contents at those sites and remove them. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-13 12:32:48 +02:00
Jeremy Sowden	b0edba2af7	netfilter: fix coding-style errors. Several header-files, Kconfig files and Makefiles have trailing white-space. Remove it. In netfilter/Kconfig, indent the type of CONFIG_NETFILTER_NETLINK_ACCT correctly. There are semicolons at the end of two function definitions in include/net/netfilter/nf_conntrack_acct.h and include/net/netfilter/nf_conntrack_ecache.h. Remove them. Fix indentation in nf_conntrack_l4proto.h. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-13 11:39:38 +02:00
wenxu	06d392cbe3	netfilter: nf_tables_offload: remove rules when the device unregisters If the net_device unregisters, clean up the offload rules before the chain is destroy. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-13 10:58:10 +02:00
wenxu	e211aab73d	netfilter: nf_tables_offload: refactor the nft_flow_offload_rule function Pass rule, chain and flow_rule object parameters to nft_flow_offload_rule to reuse it. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-13 10:12:05 +02:00
wenxu	8fc618c52d	netfilter: nf_tables_offload: refactor the nft_flow_offload_chain function Pass chain and policy parameters to nft_flow_offload_chain to reuse it. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-13 10:11:57 +02:00
wenxu	504882db83	netfilter: nf_tables_offload: add __nft_offload_get_chain function Add __nft_offload_get_chain function to get basechain from device. This function requires that caller holds the per-netns nftables mutex. This patch implicitly fixes missing offload flags check and proper mutex from nft_indr_block_cb(). Fixes: `9a32669fec` ("netfilter: nf_tables_offload: support indr block call") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-13 01:09:15 +02:00
Christophe JAILLET	b456d72412	sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()' The '.exit' functions from 'pernet_operations' structure should be marked as __net_exit, not __net_init. Fixes: `8e2d61e0ae` ("sctp: fix race on protocol/netns initialization") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-12 12:55:28 +01:00
Navid Emamdoost	a21b7f0cff	net: qrtr: fix memort leak in qrtr_tun_write_iter In qrtr_tun_write_iter the allocated kbuf should be release in case of error or success return. v2 Update: Thanks to David Miller for pointing out the release on success path as well. Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-12 11:58:44 +01:00
Subash Abhinov Kasiviswanathan	10cc514f45	net: Fix null de-reference of device refcount In event of failure during register_netdevice, free_netdev is invoked immediately. free_netdev assumes that all the netdevice refcounts have been dropped prior to it being called and as a result frees and clears out the refcount pointer. However, this is not necessarily true as some of the operations in the NETDEV_UNREGISTER notifier handlers queue RCU callbacks for invocation after a grace period. The IPv4 callback in_dev_rcu_put tries to access the refcount after free_netdev is called which leads to a null de-reference- 44837.761523: <6> Unable to handle kernel paging request at virtual address 0000004a88287000 44837.761651: <2> pc : in_dev_finish_destroy+0x4c/0xc8 44837.761654: <2> lr : in_dev_finish_destroy+0x2c/0xc8 44837.762393: <2> Call trace: 44837.762398: <2> in_dev_finish_destroy+0x4c/0xc8 44837.762404: <2> in_dev_rcu_put+0x24/0x30 44837.762412: <2> rcu_nocb_kthread+0x43c/0x468 44837.762418: <2> kthread+0x118/0x128 44837.762424: <2> ret_from_fork+0x10/0x1c Fix this by waiting for the completion of the call_rcu() in case of register_netdevice errors. Fixes: `93ee31f14f` ("[NET]: Fix free_netdev on register_netdev failure.") Cc: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-12 11:55:34 +01:00
George McCollister	f4073e9164	net: dsa: microchip: remove NET_DSA_TAG_KSZ_COMMON Remove the superfluous NET_DSA_TAG_KSZ_COMMON and just use the existing NET_DSA_TAG_KSZ. Update the description to mention the three switch families it supports. No functional change. Signed-off-by: George McCollister <george.mccollister@gmail.com> Reviewed-by: Marek Vasut <marex@denx.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-12 11:36:12 +01:00
Christophe JAILLET	d23dbc479a	ipv6: Fix the link time qualifier of 'ping_v6_proc_exit_net()' The '.exit' functions from 'pernet_operations' structure should be marked as __net_exit, not __net_init. Fixes: `d862e54614` ("net: ipv6: Implement /proc/net/icmp6.") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-12 11:20:33 +01:00
Eric Dumazet	051ba67447	tcp: force a PSH flag on TSO packets When tcp sends a TSO packet, adding a PSH flag on it reduces the sojourn time of GRO packet in GRO receivers. This is particularly the case under pressure, since RX queues receive packets for many concurrent flows. A sender can give a hint to GRO engines when it is appropriate to flush a super-packet, especially when pacing is in the picture, since next packet is probably delayed by one ms. Having less packets in GRO engine reduces chance of LRU eviction or inflated RTT, and reduces GRO cost. We found recently that we must not set the PSH flag on individual full-size MSS segments [1] : Under pressure (CWR state), we better let the packet sit for a small delay (depending on NAPI logic) so that the ACK packet is delayed, and thus next packet we send is also delayed a bit. Eventually the bottleneck queue can be drained. DCTCP flows with CWND=1 have demonstrated the issue. This patch allows to slowdown the aggregate traffic without involving high resolution timers on senders and/or receivers. It has been used at Google for about four years, and has been discussed at various networking conferences. [1] segments smaller than MSS already have PSH flag set by tcp_sendmsg() / tcp_mark_push(), unless MSG_MORE has been requested by the user. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Tariq Toukan <tariqt@mellanox.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-11 23:59:01 +01:00
Neal Cardwell	af38d07ed3	tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR Fix tcp_ecn_withdraw_cwr() to clear the correct bit: TCP_ECN_QUEUE_CWR. Rationale: basically, TCP_ECN_DEMAND_CWR is a bit that is purely about the behavior of data receivers, and deciding whether to reflect incoming IP ECN CE marks as outgoing TCP th->ece marks. The TCP_ECN_QUEUE_CWR bit is purely about the behavior of data senders, and deciding whether to send CWR. The tcp_ecn_withdraw_cwr() function is only called from tcp_undo_cwnd_reduction() by data senders during an undo, so it should zero the sender-side state, TCP_ECN_QUEUE_CWR. It does not make sense to stop the reflection of incoming CE bits on incoming data packets just because outgoing packets were spuriously retransmitted. The bug has been reproduced with packetdrill to manifest in a scenario with RFC3168 ECN, with an incoming data packet with CE bit set and carrying a TCP timestamp value that causes cwnd undo. Before this fix, the IP CE bit was ignored and not reflected in the TCP ECE header bit, and sender sent a TCP CWR ('W') bit on the next outgoing data packet, even though the cwnd reduction had been undone. After this fix, the sender properly reflects the CE bit and does not set the W bit. Note: the bug actually predates 2005 git history; this Fixes footer is chosen to be the oldest SHA1 I have tested (from Sep 2007) for which the patch applies cleanly (since before this commit the code was in a .h file). Fixes: `bdf1ee5d3b` ("[TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it") Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-11 23:53:18 +01:00
Stefano Brivio	cbfd68913c	ipv6: Don't use dst gateway directly in ip6_confirm_neigh() This is the equivalent of commit `2c6b55f45d` ("ipv6: fix neighbour resolution with raw socket") for ip6_confirm_neigh(): we can send a packet with MSG_CONFIRM on a raw socket for a connected route, so the gateway would be :: here, and we should pick the next hop using rt6_nexthop() instead. This was found by code review and, to the best of my knowledge, doesn't actually fix a practical issue: the destination address from the packet is not considered while confirming a neighbour, as ip6_confirm_neigh() calls choose_neigh_daddr() without passing the packet, so there are no similar issues as the one fixed by said commit. A possible source of issues with the existing implementation might come from the fact that, if we have a cached dst, we won't consider it, while rt6_nexthop() takes care of that. I might just not be creative enough to find a practical problem here: the only way to affect this with cached routes is to have one coming from an ICMPv6 redirect, but if the next hop is a directly connected host, there should be no topology for which a redirect applies here, and tests with redirected routes show no differences for MSG_CONFIRM (and MSG_PROBE) packets on raw sockets destined to a directly connected host. However, directly using the dst gateway here is not consistent anymore with neighbour resolution, and, in general, as we want the next hop, using rt6_nexthop() looks like the only sane way to fetch it. Reported-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Guillaume Nault <gnault@redhat.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-11 23:52:17 +01:00
Ka-Cheong Poon	c5c1a030a7	net/rds: An rds_sock is added too early to the hash table In rds_bind(), an rds_sock is added to the RDS bind hash table before rs_transport is set. This means that the socket can be found by the receive code path when rs_transport is NULL. And the receive code path de-references rs_transport for congestion update check. This can cause a panic. An rds_sock should not be added to the bind hash table before all the needed fields are set. Reported-by: syzbot+4b4f8163c2e246df3c4c@syzkaller.appspotmail.com Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-11 15:05:40 +01:00
Jouni Malinen	3e493173b7	mac80211: Do not send Layer 2 Update frame before authorization The Layer 2 Update frame is used to update bridges when a station roams to another AP even if that STA does not transmit any frames after the reassociation. This behavior was described in IEEE Std 802.11F-2003 as something that would happen based on MLME-ASSOCIATE.indication, i.e., before completing 4-way handshake. However, this IEEE trial-use recommended practice document was published before RSN (IEEE Std 802.11i-2004) and as such, did not consider RSN use cases. Furthermore, IEEE Std 802.11F-2003 was withdrawn in 2006 and as such, has not been maintained amd should not be used anymore. Sending out the Layer 2 Update frame immediately after association is fine for open networks (and also when using SAE, FT protocol, or FILS authentication when the station is actually authenticated by the time association completes). However, it is not appropriate for cases where RSN is used with PSK or EAP authentication since the station is actually fully authenticated only once the 4-way handshake completes after authentication and attackers might be able to use the unauthenticated triggering of Layer 2 Update frame transmission to disrupt bridge behavior. Fix this by postponing transmission of the Layer 2 Update frame from station entry addition to the point when the station entry is marked authorized. Similarly, send out the VLAN binding update only if the STA entry has already been authorized. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-11 14:59:26 +01:00
David S. Miller	c1b3ddf7c3	We have a number of changes, but things are settling down: * a fix in the new 6 GHz channel support * a fix for recent minstrel (rate control) updates for an infinite loop * handle interface type changes better wrt. management frame registrations (for management frames sent to userspace) * add in-BSS RX time to survey information * handle HW rfkill properly if !CONFIG_RFKILL * send deauth on IBSS station expiry, to avoid state mismatches * handle deferred crypto tailroom updates in mac80211 better when device restart happens * fix a spectre-v1 - really a continuation of a previous patch * advertise NL80211_CMD_UPDATE_FT_IES as supported if so * add some missing parsing in VHT extended NSS support * support HE in mac80211_hwsim * let mac80211 drivers determine the max MTU themselves along with the usual cleanups etc. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl148l4ACgkQB8qZga/f l8TafQ//Yjunnxq49ClKMDxyKwxIBvLRAMr4D3QwaWAcWUpar1/V0Ft/5glnHtbZ 7QptwbxVNUk/N68hYi8wlpMpvfzv/nShZD8QBNS1bk5E3Gng3yow03LhOx5iaYb2 KJXS1GE4jwkMD7Xn65+eMeb8rt1vEj8LleX91cguilq+y5YbcNFsP1nil2RtQyBU cf5i8CBu4I5rTBoFaRvcz2xn+blqPSm2/piA+yXjzFp9vyVmhD+FjR5T482u48pj wi/1zersGVUzBNElnZOKg67XPir1fcJqCfLILr7okPItWuYXHdVHGfn+arhimK4W dyIpv1EfCe0lwKl4VTdhXt1GwhKvWCc2Ja7lz/RnDisGq9CYPJNfW7jFgR3tw4eg SccnUhnxRIgD1V2KDcvsRadPo+2YsBJBmC61JRX1K6L3zpHLbktDyzxvTwCeVQ9A TQp5bmQcKqDoq+/60JNHI6IxMbmX/vc2PC7dENGWUkem/JEmSWBB1wcL9gsRkhVi c4uHyvFeXJaIV7YFA36hWCfa0fr+UUYxfzxudRSvxq/tTpayqBKu6fr3Pvv0SqTj /BKkezoIdJntClhJv4PcDkZMva4uMCPtHCero9eICX5J+4AanD6caRNefX6L0PfF DtN1sscVOy9zY2fV4tqvn3IASmOE5yB/dxjtKkMYyNHkZsGiDI0= =HAuT -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2019-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== We have a number of changes, but things are settling down: * a fix in the new 6 GHz channel support * a fix for recent minstrel (rate control) updates for an infinite loop * handle interface type changes better wrt. management frame registrations (for management frames sent to userspace) * add in-BSS RX time to survey information * handle HW rfkill properly if !CONFIG_RFKILL * send deauth on IBSS station expiry, to avoid state mismatches * handle deferred crypto tailroom updates in mac80211 better when device restart happens * fix a spectre-v1 - really a continuation of a previous patch * advertise NL80211_CMD_UPDATE_FT_IES as supported if so * add some missing parsing in VHT extended NSS support * support HE in mac80211_hwsim * let mac80211 drivers determine the max MTU themselves along with the usual cleanups etc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-11 14:57:17 +01:00
Denis Kenzior	c1d3ad84ea	cfg80211: Purge frame registrations on iftype change Currently frame registrations are not purged, even when changing the interface type. This can lead to potentially weird situations where frames possibly not allowed on a given interface type remain registered due to the type switching happening after registration. The kernel currently relies on userspace apps to actually purge the registrations themselves, this is not something that the kernel should rely on. Add a call to cfg80211_mlme_purge_registrations() to forcefully remove any registrations left over prior to switching the iftype. Cc: stable@vger.kernel.org Signed-off-by: Denis Kenzior <denkenz@gmail.com> Link: https://lore.kernel.org/r/20190828211110.15005-1-denkenz@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-09-11 10:45:10 +02:00
Masashi Honma	4b2c5a14cd	nl80211: Fix possible Spectre-v1 for CQM RSSI thresholds commit `1222a16014` ("nl80211: Fix possible Spectre-v1 for CQM RSSI thresholds") was incomplete and requires one more fix to prevent accessing to rssi_thresholds[n] because user can control rssi_thresholds[i] values to make i reach to n. For example, rssi_thresholds = {-400, -300, -200, -100} when last is -34. Cc: stable@vger.kernel.org Fixes: `1222a16014` ("nl80211: Fix possible Spectre-v1 for CQM RSSI thresholds") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Masashi Honma <masashi.honma@gmail.com> Link: https://lore.kernel.org/r/20190908005653.17433-1-masashi.honma@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-09-11 09:33:29 +02:00
Wen Gong	06354665f9	mac80211: allow drivers to set max MTU Make it possibly for drivers to adjust the default max_mtu by storing it in the hardware struct and using that value for all interfaces. Signed-off-by: Wen Gong <wgong@codeaurora.org> Link: https://lore.kernel.org/r/1567738137-31748-1-git-send-email-wgong@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-09-11 09:33:29 +02:00
zhong jiang	24f6d765c8	cfg80211: Do not compare with boolean in nl80211_common_reg_change_event With the help of boolinit.cocci, we use !nl80211_reg_change_event_fill instead of (nl80211_reg_change_event_fill == false). Meanwhile, Clean up the code. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Link: https://lore.kernel.org/r/1567657537-65472-1-git-send-email-zhongjiang@huawei.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-09-11 09:33:29 +02:00
Johannes Berg	4b08d1b6a9	mac80211: IBSS: send deauth when expiring inactive STAs When we expire an inactive station, try to send it a deauth. This helps if it's actually still around, and just has issues with beacon distribution (or we do), and it will not also remove us. Then, if we have shared state, this may not be reset properly, causing problems; for example, we saw a case where aggregation sessions weren't removed properly (due to the TX start being offloaded to firmware and it relying on deauth for stop), causing a lot of traffic to get lost due to the SN reset after remove/add of the peer. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20190830112451.21655-9-luca@coelho.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-09-11 09:33:29 +02:00
Luca Coelho	753a9a729f	mac80211: don't check if key is NULL in ieee80211_key_link() We already assume that key is not NULL and dereference it in a few other places before we check whether it is NULL, so the check is unnecessary. Remove it. Fixes: `96fc6efb9a` ("mac80211: IEEE 802.11 Extended Key ID support") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20190830112451.21655-8-luca@coelho.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-09-11 09:33:28 +02:00
Lior Cohen	624ff4b210	mac80211: clear crypto tx tailroom counter upon keys enable In case we got a fw restart while roaming from encrypted AP to non-encrypted one, we might end up with hitting a warning on the pending counter crypto_tx_tailroom_pending_dec having a non-zero value. The following comment taken from net/mac80211/key.c explains the rational for the delayed tailroom needed: /* * The reason for the delayed tailroom needed decrementing is to * make roaming faster: during roaming, all keys are first deleted * and then new keys are installed. The first new key causes the * crypto_tx_tailroom_needed_cnt to go from 0 to 1, which invokes * the cost of synchronize_net() (which can be slow). Avoid this * by deferring the crypto_tx_tailroom_needed_cnt decrementing on * key removal for a while, so if we roam the value is larger than * zero and no 0->1 transition happens. * * The cost is that if the AP switching was from an AP with keys * to one without, we still allocate tailroom while it would no * longer be needed. However, in the typical (fast) roaming case * within an ESS this usually won't happen. */ The next flow lead to the warning eventually reported as a bug: 1. Disconnect from encrypted AP 2. Set crypto_tx_tailroom_pending_dec = 1 for the key 3. Schedule work 4. Reconnect to non-encrypted AP 5. Add a new key, setting the tailroom counter = 1 6. Got FW restart while pending counter is set ---> hit the warning While on it, the ieee80211_reset_crypto_tx_tailroom() func was merged into its single caller ieee80211_reenable_keys (previously called ieee80211_enable_keys). Also, we reset the crypto_tx_tailroom_pending_dec and remove the counters warning as we just reset both. Signed-off-by: Lior Cohen <lior2.cohen@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20190830112451.21655-7-luca@coelho.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-09-11 09:33:28 +02:00
Johannes Berg	1c9559734e	mac80211: remove unnecessary key condition When we reach this point, the key cannot be NULL. Remove the condition that suggests otherwise. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20190830112451.21655-6-luca@coelho.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-09-11 09:33:28 +02:00
Johannes Berg	5462632488	mac80211: list features in WEP/TKIP disable in better order "HE/HT/VHT" is a bit confusing since really the order of development (and possible support) is different - change this to "HT/VHT/HE". Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20190830112451.21655-4-luca@coelho.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-09-11 09:13:42 +02:00
Johannes Berg	3cfe91c4c3	cfg80211: always shut down on HW rfkill When the RFKILL subsystem isn't available, then rfkill_blocked() always returns false. In the case of hardware rfkill this will be wrong though, as if the hardware reported being killed then it cannot operate any longer. Since we only ever call the rfkill_sync work in this case, just rename it to rfkill_block and always pass "true" for the blocked parameter, rather than passing rfkill_blocked(). We rely on the underlying driver to still reject any new attempt to bring up the device by itself. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20190830112451.21655-2-luca@coelho.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-09-11 09:13:26 +02:00
Mordechay Goodstein	e5c0b0fff6	mac80211: vht: add support VHT EXT NSS BW in parsing VHT This fixes was missed in parsing the vht capabilities max bw support. Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Fixes: `e80d642552` ("mac80211: copy VHT EXT NSS BW Support/Capable data to station") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20190830114057.22197-1-luca@coelho.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-09-11 09:13:03 +02:00
Arend van Spriel	df5d7a88bc	cfg80211: fix boundary value in ieee80211_frequency_to_channel() The boundary value used for the 6G band was incorrect as it would result in invalid 6G channel number for certain frequencies. Reported-by: Amar Singhal <asinghal@codeaurora.org> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Link: https://lore.kernel.org/r/1567510772-24263-1-git-send-email-arend.vanspriel@broadcom.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-09-11 09:12:55 +02:00
Pablo Neira Ayuso	be2861dc36	netfilter: nft_{fwd,dup}_netdev: add offload support This patch adds support for packet mirroring and redirection. The nft_fwd_dup_netdev_offload() function configures the flow_action object for the fwd and the dup actions. Extend nft_flow_rule_destroy() to release the net_device object when the flow_rule object is released, since nft_fwd_dup_netdev_offload() bumps the net_device reference counter. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: wenxu <wenxu@ucloud.cn>	2019-09-10 22:44:29 +02:00
Fernando Fernandez Mancera	ee394f96ad	netfilter: nft_synproxy: add synproxy stateful object support Register a new synproxy stateful object type into the stateful object infrastructure. Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-10 22:35:37 +02:00
Xin Long	f794dc2304	sctp: fix the missing put_user when dumping transport thresholds This issue causes SCTP_PEER_ADDR_THLDS sockopt not to be able to dump a transport thresholds info. Fix it by adding 'goto' put_user in sctp_getsockopt_paddr_thresholds. Fixes: `8add543e36` ("sctp: add SCTP_FUTURE_ASSOC for SCTP_PEER_ADDR_THLDS sockopt") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-10 18:32:28 +01:00
Cong Wang	d4d6ec6dac	sch_hhf: ensure quantum and hhf_non_hh_weight are non-zero In case of TCA_HHF_NON_HH_WEIGHT or TCA_HHF_QUANTUM is zero, it would make no progress inside the loop in hhf_dequeue() thus kernel would get stuck. Fix this by checking this corner case in hhf_change(). Fixes: `10239edf86` ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc") Reported-by: syzbot+bc6297c11f19ee807dc2@syzkaller.appspotmail.com Reported-by: syzbot+041483004a7f45f1f20a@syzkaller.appspotmail.com Reported-by: syzbot+55be5f513bed37fc4367@syzkaller.appspotmail.com Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Terry Lam <vtlam@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-10 18:31:00 +01:00
Cong Wang	8b142a00ed	net_sched: check cops->tcf_block in tc_bind_tclass() At least sch_red and sch_tbf don't implement ->tcf_block() while still have a non-zero tc "class". Instead of adding nop implementations to each of such qdisc's, we can just relax the check of cops->tcf_block() in tc_bind_tclass(). They don't support TC filter anyway. Reported-by: syzbot+21b29db13c065852f64b@syzkaller.appspotmail.com Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-10 18:28:56 +01:00
Dirk van der Merwe	5bbd21df5a	devlink: add 'reset_dev_on_drv_probe' param Add the 'reset_dev_on_drv_probe' devlink parameter, controlling the device reset policy on driver probe. This parameter is useful in conjunction with the existing 'fw_load_policy' parameter. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-10 17:29:26 +01:00
Nicolas Dichtel	94a72b3f02	bridge/mdb: remove wrong use of NLM_F_MULTI NLM_F_MULTI must be used only when a NLMSG_DONE message is sent at the end. In fact, NLMSG_DONE is sent only at the end of a dump. Libraries like libnl will wait forever for NLMSG_DONE. Fixes: `949f1e39a6` ("bridge: mdb: notify on router port add and del") CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-10 09:10:53 +01:00
Pablo Neira Ayuso	3474a2c62f	netfilter: nf_tables_offload: move indirect flow_block callback logic to core Add nft_offload_init() and nft_offload_exit() function to deal with the init and the exit path of the offload infrastructure. Rename nft_indr_block_get_and_ing_cmd() to nft_indr_block_cb(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-08 19:18:04 +02:00
Arnd Bergmann	b44492afd2	netfilter: nf_tables_offload: avoid excessive stack usage The nft_offload_ctx structure is much too large to put on the stack: net/netfilter/nf_tables_offload.c:31:23: error: stack frame size of 1200 bytes in function 'nft_flow_rule_create' [-Werror,-Wframe-larger-than=] Use dynamic allocation here, as we do elsewhere in the same function. Fixes: `c9626a2cbd` ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-08 18:16:59 +02:00
Dan Carpenter	b74ae9618b	netfilter: nf_tables: Fix an Oops in nf_tables_updobj() error handling The "newobj" is an error pointer so we can't pass it to kfree(). It doesn't need to be freed so we can remove that and I also renamed the error label. Fixes: `d62d0ba97b` ("netfilter: nf_tables: Introduce stateful object update operation") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-08 18:10:13 +02:00
Jakub Kicinski	e681cc603a	net/tls: align non temporal copy to cache lines Unlike normal TCP code TLS has to touch the cache lines it copies into to fill header info. On memory-heavy workloads having non temporal stores and normal accesses targeting the same cache line leads to significant overhead. Measured 3% overhead running 3600 round robin connections with additional memory heavy workload. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-07 18:10:34 +02:00
Jakub Kicinski	e7b159a48b	net/tls: remove the record tail optimization For TLS device offload the tag/message authentication code are filled in by the device. The kernel merely reserves space for them. Because device overwrites it, the contents of the tag make do no matter. Current code tries to save space by reusing the header as the tag. This, however, leads to an additional frag being created and defeats buffer coalescing (which trickles all the way down to the drivers). Remove this optimization, and try to allocate the space for the tag in the usual way, leave the memory uninitialized. If memory allocation fails rewind the record pointer so that we use the already copied user data as tag. Note that the optimization was actually buggy, as the tag for TLS 1.2 is 16 bytes, but header is just 13, so the reuse may had looked past the end of the page.. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-07 18:10:34 +02:00
Jakub Kicinski	d4774ac0d4	net/tls: use RCU for the adder to the offload record list All modifications to TLS record list happen under the socket lock. Since records form an ordered queue readers are only concerned about elements being removed, additions can happen concurrently. Use RCU primitives to ensure the correct access types (READ_ONCE/WRITE_ONCE). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-07 18:10:34 +02:00
Jakub Kicinski	7ccd451912	net/tls: unref frags in order It's generally more cache friendly to walk arrays in order, especially those which are likely not in cache. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-07 18:10:34 +02:00
David S. Miller	fcd8c62709	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2019-09-06 Here's the main bluetooth-next pull request for the 5.4 kernel. - Cleanups & fixes to btrtl driver - Fixes for Realtek devices in btusb, e.g. for suspend handling - Firmware loading support for BCM4345C5 - hidp_send_message() return value handling fixes - Added support for utilizing Fast Advertising Interval - Various other minor cleanups & fixes Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-07 18:07:27 +02:00
Shmulik Ladkani	3dcbdb134f	net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list Historically, support for frag_list packets entering skb_segment() was limited to frag_list members terminating on exact same gso_size boundaries. This is verified with a BUG_ON since commit `89319d3801` ("net: Add frag_list support to skb_segment"), quote: As such we require all frag_list members terminate on exact MSS boundaries. This is checked using BUG_ON. As there should only be one producer in the kernel of such packets, namely GRO, this requirement should not be difficult to maintain. However, since commit `6578171a7f` ("bpf: add bpf_skb_change_proto helper"), the "exact MSS boundaries" assumption no longer holds: An eBPF program using bpf_skb_change_proto() DOES modify 'gso_size', but leaves the frag_list members as originally merged by GRO with the original 'gso_size'. Example of such programs are bpf-based NAT46 or NAT64. This lead to a kernel BUG_ON for flows involving: - GRO generating a frag_list skb - bpf program performing bpf_skb_change_proto() or bpf_skb_adjust_room() - skb_segment() of the skb See example BUG_ON reports in [0]. In commit `13acc94eff` ("net: permit skb_segment on head_frag frag_list skb"), skb_segment() was modified to support the "gso_size mangling" case of a frag_list GRO'ed skb, but only for frag_list members having head_frag==true (having a page-fragment head). Alas, GRO packets having frag_list members with a linear kmalloced head (head_frag==false) still hit the BUG_ON. This commit adds support to skb_segment() for a 'head_skb' packet having a frag_list whose members are non head_frag, with gso_size mangled, by disabling SG and thus falling-back to copying the data from the given 'head_skb' into the generated segmented skbs - as suggested by Willem de Bruijn [1]. Since this approach involves the penalty of skb_copy_and_csum_bits() when building the segments, care was taken in order to enable this solution only when required: - untrusted gso_size, by testing SKB_GSO_DODGY is set (SKB_GSO_DODGY is set by any gso_size mangling functions in net/core/filter.c) - the frag_list is non empty, its item is a non head_frag, and the headlen of the given 'head_skb' does not match the gso_size. [0] https://lore.kernel.org/netdev/20190826170724.25ff616f@pixies/ https://lore.kernel.org/netdev/9265b93f-253d-6b8c-f2b8-4b54eff1835c@fb.com/ [1] https://lore.kernel.org/netdev/CA+FuTSfVsgNDi7c=GUU8nMg2hWxF2SjCNLXetHeVPdnxAW5K-w@mail.gmail.com/ Fixes: `6578171a7f` ("bpf: add bpf_skb_change_proto helper") Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-07 17:58:48 +02:00
Hangbin Liu	0079ad8e8d	ipmr: remove hard code cache_resolve_queue_len limit This is a re-post of previous patch wrote by David Miller[1]. Phil Karn reported[2] that on busy networks with lots of unresolved multicast routing entries, the creation of new multicast group routes can be extremely slow and unreliable. The reason is we hard-coded multicast route entries with unresolved source addresses(cache_resolve_queue_len) to 10. If some multicast route never resolves and the unresolved source addresses increased, there will be no ability to create new multicast route cache. To resolve this issue, we need either add a sysctl entry to make the cache_resolve_queue_len configurable, or just remove cache_resolve_queue_len limit directly, as we already have the socket receive queue limits of mrouted socket, pointed by David. >From my side, I'd perfer to remove the cache_resolve_queue_len limit instead of creating two more(IPv4 and IPv6 version) sysctl entry. [1] https://lkml.org/lkml/2018/7/22/11 [2] https://lkml.org/lkml/2018/7/21/343 v3: instead of remove cache_resolve_queue_len totally, let's only remove the hard code limit when allocate the unresolved cache, as Eric Dumazet suggested, so we don't need to re-count it in other places. v2: hold the mfc_unres_lock while walking the unresolved list in queue_count(), as Nikolay Aleksandrov remind. Reported-by: Phil Karn <karn@ka9q.net> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-07 17:49:00 +02:00
Maciej Żenczykowski	8652f17c65	ipv6: addrconf_f6i_alloc - fix non-null pointer check to !IS_ERR() Fixes a stupid bug I recently introduced... ip6_route_info_create() returns an ERR_PTR(err) and not a NULL on error. Fixes: `d55a2e374a` ("net-ipv6: fix excessive RTF_ADDRCONF flag on ::1/128 local route (and others)'") Cc: David Ahern <dsahern@gmail.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-07 17:46:44 +02:00
Eric Dumazet	b58662a5f7	tcp: ulp: fix possible crash in tcp_diag_get_aux_size() tcp_diag_get_aux_size() can be called with sockets in any state. icsk_ulp_ops is only present for full sockets. For SYN_RECV or TIME_WAIT ones we would access garbage. Fixes: `61723b3932` ("tcp: ulp: add functions to dump ulp-specific information") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Luke Hsiao <lukehsiao@google.com> Reported-by: Neal Cardwell <ncardwell@google.com> Cc: Davide Caratti <dcaratti@redhat.com> Acked-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-07 17:32:28 +02:00
Jiri Pirko	3dd97a0827	net: fib_notifier: move fib_notifier_ops from struct net into per-net struct No need for fib_notifier_ops to be in struct net. It is used only by fib_notifier as a private data. Use net_generic to introduce per-net fib_notifier struct and move fib_notifier_ops there. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-07 17:28:22 +02:00
David S. Miller	b8f6a0eeb9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Add nft_reg_store64() and nft_reg_load64() helpers, from Ander Juaristi. 2) Time matching support, also from Ander Juaristi. 3) VLAN support for nfnetlink_log, from Michael Braun. 4) Support for set element deletions from the packet path, also from Ander. 5) Remove __read_mostly from conntrack spinlock, from Li RongQing. 6) Support for updating stateful objects, this also includes the initial client for this infrastructure: the quota extension. A follow up fix for the control plane also comes in this batch. Patches from Fernando Fernandez Mancera. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-07 16:31:30 +02:00
Sami Tolvanen	a2c11b0341	kcm: use BPF_PROG_RUN Instead of invoking struct bpf_prog::bpf_func directly, use the BPF_PROG_RUN macro. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-09-06 10:04:31 -07:00
David S. Miller	1e46c09ec1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== The following pull-request contains BPF updates for your net-next tree. The main changes are: 1) Add the ability to use unaligned chunks in the AF_XDP umem. By relaxing where the chunks can be placed, it allows to use an arbitrary buffer size and place whenever there is a free address in the umem. Helps more seamless DPDK AF_XDP driver integration. Support for i40e, ixgbe and mlx5e, from Kevin and Maxim. 2) Addition of a wakeup flag for AF_XDP tx and fill rings so the application can wake up the kernel for rx/tx processing which avoids busy-spinning of the latter, useful when app and driver is located on the same core. Support for i40e, ixgbe and mlx5e, from Magnus and Maxim. 3) bpftool fixes for printf()-like functions so compiler can actually enforce checks, bpftool build system improvements for custom output directories, and addition of 'bpftool map freeze' command, from Quentin. 4) Support attaching/detaching XDP programs from 'bpftool net' command, from Daniel. 5) Automatic xskmap cleanup when AF_XDP socket is released, and several barrier/{read,write}_once fixes in AF_XDP code, from Björn. 6) Relicense of bpf_helpers.h/bpf_endian.h for future libbpf inclusion as well as libbpf versioning improvements, from Andrii. 7) Several new BPF kselftests for verifier precision tracking, from Alexei. 8) Several BPF kselftest fixes wrt endianess to run on s390x, from Ilya. 9) And more BPF kselftest improvements all over the place, from Stanislav. 10) Add simple BPF map op cache for nfp driver to batch dumps, from Jakub. 11) AF_XDP socket umem mapping improvements for 32bit archs, from Ivan. 12) Add BPF-to-BPF call and BTF line info support for s390x JIT, from Yauheni. 13) Small optimization in arm64 JIT to spare 1 insns for BPF_MOD, from Jerin. 14) Fix an error check in bpf_tcp_gen_syncookie() helper, from Petar. 15) Various minor fixes and cleanups, from Nathan, Masahiro, Masanari, Peter, Wei, Yue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-06 16:49:17 +02:00
Dan Elkouby	8bb3537095	Bluetooth: hidp: Fix assumptions on the return value of hidp_send_message hidp_send_message was changed to return non-zero values on success, which some other bits did not expect. This caused spurious errors to be propagated through the stack, breaking some drivers, such as hid-sony for the Dualshock 4 in Bluetooth mode. As pointed out by Dan Carpenter, hid-microsoft directly relied on that assumption as well. Fixes: `48d9cc9d85` ("Bluetooth: hidp: Let hidp_send_message return number of queued bytes") Signed-off-by: Dan Elkouby <streetwalkermc@gmail.com> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2019-09-06 15:55:40 +02:00
Eric Dumazet	b88dd52c62	net: sched: fix reordering issues Whenever MQ is not used on a multiqueue device, we experience serious reordering problems. Bisection found the cited commit. The issue can be described this way : - A single qdisc hierarchy is shared by all transmit queues. (eg : tc qdisc replace dev eth0 root fq_codel) - When/if try_bulk_dequeue_skb_slow() dequeues a packet targetting a different transmit queue than the one used to build a packet train, we stop building the current list and save the 'bad' skb (P1) in a special queue. (bad_txq) - When dequeue_skb() calls qdisc_dequeue_skb_bad_txq() and finds this skb (P1), it checks if the associated transmit queues is still in frozen state. If the queue is still blocked (by BQL or NIC tx ring full), we leave the skb in bad_txq and return NULL. - dequeue_skb() calls q->dequeue() to get another packet (P2) The other packet can target the problematic queue (that we found in frozen state for the bad_txq packet), but another cpu just ran TX completion and made room in the txq that is now ready to accept new packets. - Packet P2 is sent while P1 is still held in bad_txq, P1 might be sent at next round. In practice P2 is the lead of a big packet train (P2,P3,P4 ...) filling the BQL budget and delaying P1 by many packets :/ To solve this problem, we have to block the dequeue process as long as the first packet in bad_txq can not be sent. Reordering issues disappear and no side effects have been seen. Fixes: `a53851e2c3` ("net: sched: explicit locking in gso_cpu fallback") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-06 15:12:33 +02:00
David S. Miller	2e9550ed67	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2019-09-05 1) Several xfrm interface fixes from Nicolas Dichtel: - Avoid an interface ID corruption on changelink. - Fix wrong intterface names in the logs. - Fix a list corruption when changing network namespaces. - Fix unregistation of the underying phydev. 2) Fix a potential warning when merging xfrm_plocy nodes. From Florian Westphal. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-06 15:09:16 +02:00
David Dai	d1967e495a	net_sched: act_police: add 2 new attributes to support police 64bit rate and peakrate For high speed adapter like Mellanox CX-5 card, it can reach upto 100 Gbits per second bandwidth. Currently htb already supports 64bit rate in tc utility. However police action rate and peakrate are still limited to 32bit value (upto 32 Gbits per second). Add 2 new attributes TCA_POLICE_RATE64 and TCA_POLICE_RATE64 in kernel for 64bit support so that tc utility can use them for 64bit rate and peakrate value to break the 32bit limit, and still keep the backward binary compatibility. Tested-by: David Dai <zdai@linux.vnet.ibm.com> Signed-off-by: David Dai <zdai@linux.vnet.ibm.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-06 15:02:16 +02:00
Paul Blakey	95a7233c45	net: openvswitch: Set OvS recirc_id from tc chain index Offloaded OvS datapath rules are translated one to one to tc rules, for example the following simplified OvS rule: recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2) Will be translated to the following tc rule: $ tc filter add dev dev1 ingress \ prio 1 chain 0 proto ip \ flower tcp ct_state -trk \ action ct pipe \ action goto chain 2 Received packets will first travel though tc, and if they aren't stolen by it, like in the above rule, they will continue to OvS datapath. Since we already did some actions (action ct in this case) which might modify the packets, and updated action stats, we would like to continue the proccessing with the correct recirc_id in OvS (here recirc_id(2)) where we left off. To support this, introduce a new skb extension for tc, which will be used for translating tc chain to ovs recirc_id to handle these miss cases. Last tc chain index will be set by tc goto chain action and read by OvS datapath. Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-06 14:59:18 +02:00
Al Viro	533770cc0a	new helper: get_tree_keyed() For vfs_get_keyed_super users. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-09-05 14:34:22 -04:00
Gustavo A. R. Silva	72bb169e02	Bluetooth: mgmt: Use struct_size() helper One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct mgmt_rp_get_connections { ... struct mgmt_addr_info addr[0]; } __packed; Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. So, replace the following form: sizeof(rp) + (i sizeof(struct mgmt_addr_info)); with: struct_size(rp, addr, i) Also, notice that, in this case, variable rp_len is not necessary, hence it is removed. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2019-09-05 17:27:22 +02:00
Nishka Dasgupta	569428dabc	Bluetooth: 6lowpan: Make variable header_ops constant Static variable header_ops, of type header_ops, is used only once, when it is assigned to field header_ops of a variable having type net_device. This corresponding field is declared as const in the definition of net_device. Hence make header_ops constant as well to protect it from unnecessary modification. Issue found with Coccinelle. Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2019-09-05 17:27:21 +02:00
Spoorthi Ravishankar Koppad	ad4a6795e0	Bluetooth: Add support for utilizing Fast Advertising Interval Changes made to add support for fast advertising interval as per core 4.1 specification, section 9.3.11.2. A peripheral device entering any of the following GAP modes and sending either non-connectable advertising events or scannable undirected advertising events should use adv_fast_interval2 (100ms - 150ms) for adv_fast_period(30s). - Non-Discoverable Mode - Non-Connectable Mode - Limited Discoverable Mode - General Discoverable Mode Signed-off-by: Spoorthi Ravishankar Koppad <spoorthix.k@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2019-09-05 17:27:21 +02:00
Björn Töpel	25dc18ff9b	xsk: lock the control mutex in sock_diag interface When accessing the members of an XDP socket, the control mutex should be held. This commit fixes that. Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Fixes: `a36b38aa2a` ("xsk: add sock_diag interface for AF_XDP") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-09-05 14:11:52 +02:00
Björn Töpel	42fddcc7c6	xsk: use state member for socket synchronization Prior the state variable was introduced by Ilya, the dev member was used to determine whether the socket was bound or not. However, when dev was read, proper SMP barriers and READ_ONCE were missing. In order to address the missing barriers and READ_ONCE, we start using the state variable as a point of synchronization. The state member read/write is paired with proper SMP barriers, and from this follows that the members described above does not need READ_ONCE if used in conjunction with state check. In all syscalls and the xsk_rcv path we check if state is XSK_BOUND. If that is the case we do a SMP read barrier, and this implies that the dev, umem and all rings are correctly setup. Note that no READ_ONCE are needed for these variable if used when state is XSK_BOUND (plus the read barrier). To summarize: The members struct xdp_sock members dev, queue_id, umem, fq, cq, tx, rx, and state were read lock-less, with incorrect barriers and missing {READ, WRITE}_ONCE. Now, umem, fq, cq, tx, rx, and state are read lock-less. When these members are updated, WRITE_ONCE is used. When read, READ_ONCE are only used when read outside the control mutex (e.g. mmap) or, not synchronized with the state member (XSK_BOUND plus smp_rmb()) Note that dev and queue_id do not need a WRITE_ONCE or READ_ONCE, due to the introduce state synchronization (XSK_BOUND plus smp_rmb()). Introducing the state check also fixes a race, found by syzcaller, in xsk_poll() where umem could be accessed when stale. Suggested-by: Hillf Danton <hdanton@sina.com> Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com Fixes: `77cd0d7b3f` ("xsk: add support for need_wakeup flag in AF_XDP rings") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-09-05 14:11:52 +02:00
Björn Töpel	9764f4b301	xsk: avoid store-tearing when assigning umem The umem member of struct xdp_sock is read outside of the control mutex, in the mmap implementation, and needs a WRITE_ONCE to avoid potential store-tearing. Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Fixes: `423f38329d` ("xsk: add umem fill queue support and mmap") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-09-05 14:11:52 +02:00
Björn Töpel	94a997637c	xsk: avoid store-tearing when assigning queues Use WRITE_ONCE when doing the store of tx, rx, fq, and cq, to avoid potential store-tearing. These members are read outside of the control mutex in the mmap implementation. Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Fixes: `37b076933a` ("xsk: add missing write- and data-dependency barrier") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-09-05 14:11:52 +02:00
Fernando Fernandez Mancera	aa4095a156	netfilter: nf_tables: fix possible null-pointer dereference in object update Not all objects have an update operation. If the object type doesn't implement an update operation and the user tries to update it will hit EOPNOTSUPP. Fixes: `d62d0ba97b` ("netfilter: nf_tables: Introduce stateful object update operation") Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-05 13:40:27 +02:00
Donald Sharp	7bdf4de126	net: Properly update v4 routes with v6 nexthop When creating a v4 route that uses a v6 nexthop from a nexthop group. Allow the kernel to properly send the nexthop as v6 via the RTA_VIA attribute. Broken behavior: $ ip nexthop add via fe80::9 dev eth0 $ ip nexthop show id 1 via fe80::9 dev eth0 scope link $ ip route add 4.5.6.7/32 nhid 1 $ ip route show default via 10.0.2.2 dev eth0 4.5.6.7 nhid 1 via 254.128.0.0 dev eth0 10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 $ Fixed behavior: $ ip nexthop add via fe80::9 dev eth0 $ ip nexthop show id 1 via fe80::9 dev eth0 scope link $ ip route add 4.5.6.7/32 nhid 1 $ ip route show default via 10.0.2.2 dev eth0 4.5.6.7 nhid 1 via inet6 fe80::9 dev eth0 10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 $ v2, v3: Addresses code review comments from David Ahern Fixes: `dcb1ecb50e` (“ipv4: Prepare for fib6_nh from a nexthop object”) Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-05 12:35:58 +02:00
Andy Shevchenko	a8a213cbed	pppoatm: use %ph to print small buffer Use %ph format to print small buffer as hex string. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-05 12:33:28 +02:00
David S. Miller	44c40910b6	linux-can-next-for-5.4-20190904 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEmvEkXzgOfc881GuFWsYho5HknSAFAl1vrJITHG1rbEBwZW5n dXRyb25peC5kZQAKCRBaxiGjkeSdIC8BB/98XcWiaInD+SM6UjD2dVd1r0zhPKJS WBK58G81+op3YP4DY8Iy+C24uZBlSlutVGoD/PIrZF39xXsnOtJuMVHC4LvtdADC 30uI/61JQNEjuX2AiTFudqDvYjZZKZ28HLqEnO2pWk3dMVL3+fkS3i7VQR7KJ/Gr BYM6EzCdkbuWW/zsAVbKLJ8NswVmcdjP7eSK+exKppoWMtgCglZw1X6iP5YXDnbK h3dGs687u8RfUra7j7vgnJzyQU4draMPsabaLDT5qw1PgYQ3k8MTVMBlULR0+HHO qkBqumRwfOxay0z0XOgRuWrICKTH/b0SRLp3H53ZyfDo6+4TC9KGHRgX =gwfZ -----END PGP SIGNATURE----- Merge tag 'linux-can-next-for-5.4-20190904' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2019-09-04 j1939 this is a pull request for net-next/master consisting of 21 patches. the first 12 patches are by me and target the CAN core infrastructure. They clean up the names of variables , structs and struct members, convert can_rx_register() to use max() instead of open coding it and remove unneeded code from the can_pernet_exit() callback. The next three patches are also by me and they introduce and make use of the CAN midlayer private structure. It is used to hold protocol specific per device data structures. The next patch is by Oleksij Rempel, switches the &net->can.rcvlists_lock from a spin_lock() to a spin_lock_bh(), so that it can be used from NAPI (soft IRQ) context. The next 4 patches are by Kurt Van Dijck, he first updates his email address via mailmap and then extends sockaddr_can to include j1939 members. The final patch is the collective effort of many entities (The j1939 authors: Oliver Hartkopp, Bastian Stender, Elenita Hinds, kbuild test robot, Kurt Van Dijck, Maxime Jayat, Robin van der Gracht, Oleksij Rempel, Marc Kleine-Budde). It adds support of SAE J1939 protocol to the CAN networking stack. SAE J1939 is the vehicle bus recommended practice used for communication and diagnostics among vehicle components. Originating in the car and heavy-duty truck industry in the United States, it is now widely used in other parts of the world. P.S.: This pull request doesn't invalidate my last pull request: "pull-request: can-next 2019-09-03". ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-05 12:17:50 +02:00
zhong jiang	da3a3b653b	net: mpoa: Use kzfree rather than its implementation. Use kzfree instead of memset() + kfree(). Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-05 12:06:04 +02:00
zhong jiang	60b3990c2c	sunrpc: Use kzfree rather than its implementation. Use kzfree instead of memset() + kfree(). Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-05 12:06:04 +02:00
David Ahern	4255ff0544	ipv6: Fix RTA_MULTIPATH with nexthop objects A change to the core nla helpers was missed during the push of the nexthop changes. rt6_fill_node_nexthop should be calling nla_nest_start_noflag not nla_nest_start. Currently, iproute2 does not print multipath data because of parsing issues with the attribute. Fixes: `f88d8ea67f` ("ipv6: Plumb support for nexthop object in a fib6_info") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-05 11:59:39 +02:00
John Fastabend	44580a0118	net: sock_map, fix missing ulp check in sock hash case sock_map and ULP only work together when ULP is loaded after the sock map is loaded. In the sock_map case we added a check for this to fail the load if ULP is already set. However, we missed the check on the sock_hash side. Add a ULP check to the sock_hash update path. Fixes: `604326b41a` ("bpf, sockmap: convert to generic sk_msg interface") Reported-by: syzbot+7a6ee4d0078eac6bf782@syzkaller.appspotmail.com Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-05 11:56:19 +02:00
Xin Long	42dec1dbe3	tipc: add NULL pointer check before calling kfree_rcu Unlike kfree(p), kfree_rcu(p, rcu) won't do NULL pointer check. When tipc_nametbl_remove_publ returns NULL, the panic below happens: BUG: unable to handle kernel NULL pointer dereference at 0000000000000068 RIP: 0010:__call_rcu+0x1d/0x290 Call Trace: <IRQ> tipc_publ_notify+0xa9/0x170 [tipc] tipc_node_write_unlock+0x8d/0x100 [tipc] tipc_node_link_down+0xae/0x1d0 [tipc] tipc_node_check_dest+0x3ea/0x8f0 [tipc] ? tipc_disc_rcv+0x2c7/0x430 [tipc] tipc_disc_rcv+0x2c7/0x430 [tipc] ? tipc_rcv+0x6bb/0xf20 [tipc] tipc_rcv+0x6bb/0xf20 [tipc] ? ip_route_input_slow+0x9cf/0xb10 tipc_udp_recv+0x195/0x1e0 [tipc] ? tipc_udp_is_known_peer+0x80/0x80 [tipc] udp_queue_rcv_skb+0x180/0x460 udp_unicast_rcv_skb.isra.56+0x75/0x90 __udp4_lib_rcv+0x4ce/0xb90 ip_local_deliver_finish+0x11c/0x210 ip_local_deliver+0x6b/0xe0 ? ip_rcv_finish+0xa9/0x410 ip_rcv+0x273/0x362 Fixes: `97ede29e80` ("tipc: convert name table read-write lock to RCU") Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-05 09:58:49 +02:00
Michael S. Tsirkin	f4d7c8e3da	vsock/virtio: a better comment on credit update The comment we have is just repeating what the code does. Include the reason for the condition instead. Cc: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-05 09:53:01 +02:00
Jakub Kicinski	6e3d02b670	net/tls: dedup the record cleanup If retransmit record hint fall into the cleanup window we will free it by just walking the list. No need to duplicate the code. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-05 09:49:49 +02:00
Jakub Kicinski	be2fbc155f	net/tls: clean up the number of #ifdefs for CONFIG_TLS_DEVICE TLS code has a number of #ifdefs which make the code a little harder to follow. Recent fixes removed the ifdef around the TLS_HW define, so we can switch to the often used pattern of defining tls_device functions as empty static inlines in the header when CONFIG_TLS_DEVICE=n. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-05 09:49:49 +02:00
Jakub Kicinski	3544c98acd	net/tls: narrow down the critical area of device_offload_lock On setsockopt path we need to hold device_offload_lock from the moment we check netdev is up until the context is fully ready to be added to the tls_device_list. No need to hold it around the get_netdev_for_sock(). Change the code and remove the confusing comment. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-05 09:49:49 +02:00
Jakub Kicinski	90962b4894	net/tls: don't jump to return Reusing parts of error path for normal exit will make next commit harder to read, untangle the two. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-05 09:49:49 +02:00
Jakub Kicinski	be7bbea114	net/tls: use the full sk_proto pointer Since we already have the pointer to the full original sk_proto stored use that instead of storing all individual callback pointers as well. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-05 09:49:49 +02:00
Dave Taht	842841ece5	Convert usage of IN_MULTICAST to ipv4_is_multicast IN_MULTICAST's primary intent is as a uapi macro. Elsewhere in the kernel we use ipv4_is_multicast consistently. This patch unifies linux's multicast checks to use that function rather than this macro. Signed-off-by: Dave Taht <dave.taht@gmail.com> Reviewed-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-05 09:38:32 +02:00
Colin Ian King	9367fa0841	net/sched: cbs: remove redundant assignment to variable port_rate Variable port_rate is being initialized with a value that is never read and is being re-assigned a little later on. The assignment is redundant and hence can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-05 09:37:02 +02:00
David S. Miller	6a87691c40	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Johan Hedberg says: ==================== pull request: bluetooth 2019-09-05 Here are a few more Bluetooth fixes for 5.3. I hope they can still make it. There's one USB ID addition for btusb, two reverts due to discovered regressions, and two other important fixes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-05 08:31:53 +02:00
Marcel Holtmann	68d19d7d99	Revert "Bluetooth: validate BLE connection interval updates" This reverts commit `c49a8682fc`. There are devices which require low connection intervals for usable operation including keyboards and mice. Forcing a static connection interval for these types of devices has an impact in latency and causes a regression. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2019-09-05 09:02:59 +03:00
Maciej Żenczykowski	d55a2e374a	net-ipv6: fix excessive RTF_ADDRCONF flag on ::1/128 local route (and others) There is a subtle change in behaviour introduced by: commit `c7a1ce397a` 'ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create' Before that patch /proc/net/ipv6_route includes: 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo Afterwards /proc/net/ipv6_route includes: 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80240001 lo ie. the above commit causes the ::1/128 local (automatic) route to be flagged with RTF_ADDRCONF (0x040000). AFAICT, this is incorrect since these routes are not coming from RA's. As such, this patch restores the old behaviour. Fixes: `c7a1ce397a` ("ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create") Cc: David Ahern <dsahern@gmail.com> Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-05 00:31:50 +02:00
Xin Long	10eb56c582	sctp: use transport pf_retrans in sctp_do_8_2_transport_strike Transport should use its own pf_retrans to do the error_count check, instead of asoc's. Otherwise, it's meaningless to make pf_retrans per transport. Fixes: `5aa93bcf66` ("sctp: Implement quick failover draft from tsvwg") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-05 00:29:21 +02:00
David Howells	591328948b	rxrpc: Fix misplaced traceline There's a misplaced traceline in rxrpc_input_packet() which is looking at a packet that just got released rather than the replacement packet. Fix this by moving the traceline after the assignment that moves the new packet pointer to the actual packet pointer. Fixes: `d0d5c0cd1e` ("rxrpc: Use skb_unshare() rather than skb_cow_data()") Reported-by: Hillf Danton <hdanton@sina.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-05 00:24:58 +02:00
The j1939 authors	9d71dd0c70	can: add support of SAE J1939 protocol SAE J1939 is the vehicle bus recommended practice used for communication and diagnostics among vehicle components. Originating in the car and heavy-duty truck industry in the United States, it is now widely used in other parts of the world. J1939, ISO 11783 and NMEA 2000 all share the same high level protocol. SAE J1939 can be considered the replacement for the older SAE J1708 and SAE J1587 specifications. Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Bastian Stender <bst@pengutronix.de> Signed-off-by: Elenita Hinds <ecathinds@gmail.com> Signed-off-by: kbuild test robot <lkp@intel.com> Signed-off-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> Signed-off-by: Maxime Jayat <maxime.jayat@mobile-devices.fr> Signed-off-by: Robin van der Gracht <robin@protonic.nl> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-09-04 14:22:33 +02:00
Kurt Van Dijck	9868b5d44f	can: introduce CAN_REQUIRED_SIZE macro The size of this structure will be increased with J1939 support. To stay binary compatible, the CAN_REQUIRED_SIZE macro is introduced for existing CAN protocols. Signed-off-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-09-04 13:29:15 +02:00
Oleksij Rempel	24efc6d36d	can: af_can: use spin_lock_bh() for &net->can.rcvlists_lock The can_rx_unregister() can be called from NAPI (soft IRQ) context, at least by j1939 stack. This leads to potential dead lock with &net->can.rcvlists_lock called from can_rx_register: =============================================================================== WARNING: inconsistent lock state 4.19.0-20181029-1-g3e67f95ba0d3 #3 Not tainted -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. testj1939/224 [HC0[0]:SC1[1]:HE1:SE0] takes: 1ad0fda3 (&(&net->can.rcvlists_lock)->rlock){+.?.}, at: can_rx_unregister+0x4c/0x1ac {SOFTIRQ-ON-W} state was registered at: lock_acquire+0xd0/0x1f4 _raw_spin_lock+0x30/0x40 can_rx_register+0x5c/0x14c j1939_netdev_start+0xdc/0x1f8 j1939_sk_bind+0x18c/0x1c8 __sys_bind+0x70/0xb0 sys_bind+0x10/0x14 ret_fast_syscall+0x0/0x28 0xbedc9b64 irq event stamp: 2440 hardirqs last enabled at (2440): [<c01302c0>] __local_bh_enable_ip+0xac/0x184 hardirqs last disabled at (2439): [<c0130274>] __local_bh_enable_ip+0x60/0x184 softirqs last enabled at (2412): [<c08b0bf4>] release_sock+0x84/0xa4 softirqs last disabled at (2415): [<c013055c>] irq_exit+0x100/0x1b0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&net->can.rcvlists_lock)->rlock); <Interrupt> lock(&(&net->can.rcvlists_lock)->rlock); * DEADLOCK * 2 locks held by testj1939/224: #0: 168eb13b (rcu_read_lock){....}, at: netif_receive_skb_internal+0x3c/0x350 #1: 168eb13b (rcu_read_lock){....}, at: can_receive+0x88/0x1c0 =============================================================================== To avoid this situation, we should use spin_lock_bh() instead of spin_lock(). Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-09-04 13:29:15 +02:00
Marc Kleine-Budde	bdfb5765e4	can: af_can: remove NULL-ptr checks from users of can_dev_rcv_lists_find() Since using the "struct can_ml_priv" for the per device "struct dev_rcv_lists" the call can_dev_rcv_lists_find() cannot fail anymore. This patch simplifies af_can by removing the NULL pointer checks from the dev_rcv_lists returned by can_dev_rcv_lists_find(). Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-09-04 13:29:15 +02:00
Marc Kleine-Budde	8df9ffb888	can: make use of preallocated can_ml_priv for per device struct can_dev_rcv_lists This patch removes the old method of allocating the per device protocol specific memory via a netdevice_notifier. This had the drawback, that the allocation can fail, leading to a lot of null pointer checks in the code. This also makes the live cycle management of this memory quite complicated. This patch switches from the allocating the struct can_dev_rcv_lists in a NETDEV_REGISTER call to using the dev->ml_priv, which is allocated by the driver since the previous patch. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-09-04 13:29:15 +02:00
Marc Kleine-Budde	ffd956eef6	can: introduce CAN midlayer private and allocate it automatically This patch introduces the CAN midlayer private structure ("struct can_ml_priv") which should be used to hold protocol specific per device data structures. For now it's only member is "struct can_dev_rcv_lists". The CAN midlayer private is allocated via alloc_netdev()'s private and assigned to "struct net_device::ml_priv" during device creation. This is done transparently for CAN drivers using alloc_candev(). The slcan, vcan and vxcan drivers which are not using alloc_candev() have been adopted manually. The memory layout of the netdev_priv allocated via alloc_candev() will looke like this: +-------------------------+ \| driver's priv \| +-------------------------+ \| struct can_ml_priv \| +-------------------------+ \| array of struct sk_buff \| +-------------------------+ Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-09-04 13:29:14 +02:00
Marc Kleine-Budde	3f15035606	can: af_can: can_pernet_exit(): no need to iterate over and cleanup registered CAN devices The networking core takes care and unregisters every network device in a namespace before calling the can_pernet_exit() hook. This patch removes the unneeded cleanup. Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Suggested-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-09-04 13:29:14 +02:00
Marc Kleine-Budde	e2586a5796	can: af_can: can_rx_register(): use max() instead of open coding it This patch replaces an open coded max by the proper kernel define max(). Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-09-04 13:29:14 +02:00
Marc Kleine-Budde	6625a18e9f	can: af_can: give variable holding the CAN receiver and the receiver list a sensible name This patch gives the variables holding the CAN receiver and the receiver list a better name by renaming them from "r to "rcv" and "rl" to "recv_list". Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-09-04 13:29:14 +02:00
Marc Kleine-Budde	fac785009a	can: af_can: rename find_dev_rcv_lists() to can_dev_rcv_lists_find() This patch add the commonly used prefix "can_" to the find_dev_rcv_lists() function and moves the "find" to the end, as the function returns a struct can_dev_rcv_list. This improves the overall readability of the code. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-09-04 13:29:14 +02:00
Marc Kleine-Budde	3ee6d2bebe	can: af_can: rename find_rcv_list() to can_rcv_list_find() This patch add the commonly used prefix "can_" to the find_rcv_list() function and add the "find" to the end, as the function returns a struct rcv_list. This improves the overall readability of the code. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-09-04 13:29:14 +02:00
Marc Kleine-Budde	ff7fbea4c1	can: proc: give variable holding the CAN per device receive lists a sensible name This patch gives the variables holding the CAN per device receive filter lists a better name by renaming them from "d" to "dev_rcv_lists". Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-09-04 13:29:14 +02:00
Marc Kleine-Budde	56be1d52fc	can: af_can: give variable holding the CAN per device receive lists a sensible name This patch gives the variables holding the CAN receive filter lists a better name by renaming them from "d" to "dev_rcv_lists". Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-09-04 13:29:14 +02:00
Marc Kleine-Budde	564577dfee	can: netns: remove "can_" prefix from members struct netns_can This patch improves the code reability by removing the redundant "can_" prefix from the members of struct netns_can (as the struct netns_can itself is the member "can" of the struct net.) The conversion is done with: sed -i \ -e "s/struct can_dev_rcv_lists \can_rx_alldev_list;/struct can_dev_rcv_lists rx_alldev_list;/" \ -e "s/spinlock_t can_rcvlists_lock;/spinlock_t rcvlists_lock;/" \ -e "s/struct timer_list can_stattimer;/struct timer_list stattimer; /" \ -e "s/can\.can_rx_alldev_list/can.rx_alldev_list/g" \ -e "s/can\.can_rcvlists_lock/can.rcvlists_lock/g" \ -e "s/can\.can_stattimer/can.stattimer/g" \ include/net/netns/can.h \ net/can/*.[ch] Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-09-04 13:29:14 +02:00
Marc Kleine-Budde	448c707494	can: proc: give variables holding CAN statistics a sensible name This patch rename the variables holding the CAN statistics (can_stats and can_pstats) to pkg_stats and rcv_lists_stats which reflect better their meaning. The conversion is done with: sed -i \ -e "s/can_stats$[^_]$/pkg_stats\1/g" \ -e "s/can_pstats/rcv_lists_stats/g" \ net/can/proc.c Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-09-04 13:29:14 +02:00
Marc Kleine-Budde	e2c1f5c750	can: af_can: give variables holding CAN statistics a sensible name This patch rename the variables holding the CAN statistics (can_stats and can_pstats) to pkg_stats and rcv_lists_stats which reflect better their meaning. The conversion is done with: sed -i \ -e "s/can_stats$[^_]$/pkg_stats\1/g" \ -e "s/can_pstats/rcv_lists_stats/g" \ net/can/af_can.c Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-09-04 13:29:14 +02:00
Marc Kleine-Budde	2341086df4	can: netns: give members of struct netns_can holding the statistics a sensible name This patch gives the members of the struct netns_can that are holding the statistics a sensible name, by renaming struct netns_can::can_stats into struct netns_can::pkg_stats and struct netns_can::can_pstats into struct netns_can::rcv_lists_stats. The conversion is done with: sed -i \ -e "s:$struct[^]\$can_stats;.:\1pkg_stats;:" \ -e "s:$struct[^]\$can_pstats;.:\1rcv_lists_stats;:" \ -e "s/can\.can_stats/can.pkg_stats/g" \ -e "s/can\.can_pstats/can.rcv_lists_stats/g" \ net/can/*.[ch] \ include/net/netns/can.h Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-09-04 13:29:13 +02:00
Marc Kleine-Budde	6c43bb3a41	can: netns: give structs holding the CAN statistics a sensible name This patch renames both "struct s_stats" and "struct s_pstats", to "struct can_pkg_stats" and "struct can_rcv_lists_stats" to better reflect their meaning and improve code readability. The conversion is done with: sed -i \ -e "s/struct s_stats/struct can_pkg_stats/g" \ -e "s/struct s_pstats/struct can_rcv_lists_stats/g" \ net/can/*.[ch] \ include/net/netns/can.h Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-09-04 13:29:13 +02:00
Pablo Neira Ayuso	110e48725d	netfilter: nf_flow_table: set default timeout after successful insertion Set up the default timeout for this new entry otherwise the garbage collector might quickly remove it right after the flowtable insertion. Fixes: `ac2a66665e` ("netfilter: add generic flow table infrastructure") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-03 22:55:42 +02:00
Pablo Neira Ayuso	b067fa009c	netfilter: ctnetlink: honor IPS_OFFLOAD flag If this flag is set, timeout and state are irrelevant to userspace. Fixes: `90964016e5` ("netfilter: nf_conntrack: add IPS_OFFLOAD status bit") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-03 22:55:41 +02:00
Leonardo Bras	8820914139	netfilter: nft_fib_netdev: Terminate rule eval if protocol=IPv6 and ipv6 module is disabled If IPv6 is disabled on boot (ipv6.disable=1), but nft_fib_inet ends up dealing with a IPv6 packet, it causes a kernel panic in fib6_node_lookup_1(), crashing in bad_page_fault. The panic is caused by trying to deference a very low address (0x38 in ppc64le), due to ipv6.fib6_main_tbl = NULL. BUG: Kernel NULL pointer dereference at 0x00000038 The kernel panic was reproduced in a host that disabled IPv6 on boot and have to process guest packets (coming from a bridge) using it's ip6tables. Terminate rule evaluation when packet protocol is IPv6 but the ipv6 module is not loaded. Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-03 22:53:56 +02:00
Fernando Fernandez Mancera	85936e56e9	netfilter: nft_quota: add quota object update support Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-03 19:05:00 +02:00
Fernando Fernandez Mancera	d62d0ba97b	netfilter: nf_tables: Introduce stateful object update operation This patch adds the infrastructure needed for the stateful object update support. Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-03 19:01:25 +02:00
Lu Shuaibing	0ce772fe79	9p: Transport error uninitialized The p9_tag_alloc() does not initialize the transport error t_err field. The struct p9_req_t *req is allocated and stored in a struct p9_client variable. The field t_err is never initialized before p9_conn_cancel() checks its value. KUMSAN(KernelUninitializedMemorySantizer, a new error detection tool) reports this bug. ================================================================== BUG: KUMSAN: use of uninitialized memory in p9_conn_cancel+0x2d9/0x3b0 Read of size 4 at addr ffff88805f9b600c by task kworker/1:2/1216 CPU: 1 PID: 1216 Comm: kworker/1:2 Not tainted 5.2.0-rc4+ #28 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Workqueue: events p9_write_work Call Trace: dump_stack+0x75/0xae __kumsan_report+0x17c/0x3e6 kumsan_report+0xe/0x20 p9_conn_cancel+0x2d9/0x3b0 p9_write_work+0x183/0x4a0 process_one_work+0x4d1/0x8c0 worker_thread+0x6e/0x780 kthread+0x1ca/0x1f0 ret_from_fork+0x35/0x40 Allocated by task 1979: save_stack+0x19/0x80 __kumsan_kmalloc.constprop.3+0xbc/0x120 kmem_cache_alloc+0xa7/0x170 p9_client_prepare_req.part.9+0x3b/0x380 p9_client_rpc+0x15e/0x880 p9_client_create+0x3d0/0xac0 v9fs_session_init+0x192/0xc80 v9fs_mount+0x67/0x470 legacy_get_tree+0x70/0xd0 vfs_get_tree+0x4a/0x1c0 do_mount+0xba9/0xf90 ksys_mount+0xa8/0x120 __x64_sys_mount+0x62/0x70 do_syscall_64+0x6d/0x1e0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 0: (stack is not available) The buggy address belongs to the object at ffff88805f9b6008 which belongs to the cache p9_req_t of size 144 The buggy address is located 4 bytes inside of 144-byte region [ffff88805f9b6008, ffff88805f9b6098) The buggy address belongs to the page: page:ffffea00017e6d80 refcount:1 mapcount:0 mapping:ffff888068b63740 index:0xffff88805f9b7d90 compound_mapcount: 0 flags: 0x100000000010200(slab\|head) raw: 0100000000010200 ffff888068b66450 ffff888068b66450 ffff888068b63740 raw: ffff88805f9b7d90 0000000000100001 00000001ffffffff 0000000000000000 page dumped because: kumsan: bad access detected ================================================================== Link: http://lkml.kernel.org/r/20190613070854.10434-1-shuaibinglu@126.com Signed-off-by: Lu Shuaibing <shuaibinglu@126.com> [dominique.martinet@cea.fr: grouped the added init with the others] Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>	2019-09-03 11:05:34 +00:00
Fernando Fernandez Mancera	039b1f4f24	netfilter: nft_socket: fix erroneous socket assignment The socket assignment is wrong, see skb_orphan(): When skb->destructor callback is not set, but skb->sk is set, this hits BUG(). Link: https://bugzilla.redhat.com/show_bug.cgi?id=1651813 Fixes: `554ced0a6e` ("netfilter: nf_tables: add support for native socket matching") Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-02 23:20:59 +02:00
Leonardo Bras	48bd0d68cd	netfilter: bridge: Drops IPv6 packets if IPv6 module is not loaded A kernel panic can happen if a host has disabled IPv6 on boot and have to process guest packets (coming from a bridge) using it's ip6tables. IPv6 packets need to be dropped if the IPv6 module is not loaded, and the host ip6tables will be used. Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-02 23:19:27 +02:00
Vladimir Oltean	4ba0ebbc6c	net: dsa: Fix off-by-one number of calls to devlink_port_unregister When a function such as dsa_slave_create fails, currently the following stack trace can be seen: [ 2.038342] sja1105 spi0.1: Probed switch chip: SJA1105T [ 2.054556] sja1105 spi0.1: Reset switch and programmed static config [ 2.063837] sja1105 spi0.1: Enabled switch tagging [ 2.068706] fsl-gianfar soc:ethernet@2d90000 eth2: error -19 setting up slave phy [ 2.076371] ------------[ cut here ]------------ [ 2.080973] WARNING: CPU: 1 PID: 21 at net/core/devlink.c:6184 devlink_free+0x1b4/0x1c0 [ 2.088954] Modules linked in: [ 2.092005] CPU: 1 PID: 21 Comm: kworker/1:1 Not tainted 5.3.0-rc6-01360-g41b52e38d2b6-dirty #1746 [ 2.100912] Hardware name: Freescale LS1021A [ 2.105162] Workqueue: events deferred_probe_work_func [ 2.110287] [<c03133a4>] (unwind_backtrace) from [<c030d8cc>] (show_stack+0x10/0x14) [ 2.117992] [<c030d8cc>] (show_stack) from [<c10b08d8>] (dump_stack+0xb4/0xc8) [ 2.125180] [<c10b08d8>] (dump_stack) from [<c0349d04>] (__warn+0xe0/0xf8) [ 2.132018] [<c0349d04>] (__warn) from [<c0349e34>] (warn_slowpath_null+0x40/0x48) [ 2.139549] [<c0349e34>] (warn_slowpath_null) from [<c0f19d74>] (devlink_free+0x1b4/0x1c0) [ 2.147772] [<c0f19d74>] (devlink_free) from [<c1064fc0>] (dsa_switch_teardown+0x60/0x6c) [ 2.155907] [<c1064fc0>] (dsa_switch_teardown) from [<c1065950>] (dsa_register_switch+0x8e4/0xaa8) [ 2.164821] [<c1065950>] (dsa_register_switch) from [<c0ba7fe4>] (sja1105_probe+0x21c/0x2ec) [ 2.173216] [<c0ba7fe4>] (sja1105_probe) from [<c0b35948>] (spi_drv_probe+0x80/0xa4) [ 2.180920] [<c0b35948>] (spi_drv_probe) from [<c0a4c1cc>] (really_probe+0x108/0x400) [ 2.188711] [<c0a4c1cc>] (really_probe) from [<c0a4c694>] (driver_probe_device+0x78/0x1bc) [ 2.196933] [<c0a4c694>] (driver_probe_device) from [<c0a4a3dc>] (bus_for_each_drv+0x58/0xb8) [ 2.205414] [<c0a4a3dc>] (bus_for_each_drv) from [<c0a4c024>] (__device_attach+0xd0/0x168) [ 2.213637] [<c0a4c024>] (__device_attach) from [<c0a4b1d0>] (bus_probe_device+0x84/0x8c) [ 2.221772] [<c0a4b1d0>] (bus_probe_device) from [<c0a4b72c>] (deferred_probe_work_func+0x84/0xc4) [ 2.230686] [<c0a4b72c>] (deferred_probe_work_func) from [<c03650a4>] (process_one_work+0x218/0x510) [ 2.239772] [<c03650a4>] (process_one_work) from [<c03660d8>] (worker_thread+0x2a8/0x5c0) [ 2.247908] [<c03660d8>] (worker_thread) from [<c036b348>] (kthread+0x148/0x150) [ 2.255265] [<c036b348>] (kthread) from [<c03010e8>] (ret_from_fork+0x14/0x2c) [ 2.262444] Exception stack(0xea965fb0 to 0xea965ff8) [ 2.267466] 5fa0: 00000000 00000000 00000000 00000000 [ 2.275598] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 2.283729] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 2.290333] ---[ end trace ca5d506728a0581a ]--- devlink_free is complaining right here: WARN_ON(!list_empty(&devlink->port_list)); This happens because devlink_port_unregister is no longer done right away in dsa_port_setup when a DSA_PORT_TYPE_USER has failed. Vivien said about this change that: Also no need to call devlink_port_unregister from within dsa_port_setup as this step is inconditionally handled by dsa_port_teardown on error. which is not really true. The devlink_port_unregister function _is_ being called unconditionally from within dsa_port_setup, but not for this port that just failed, just for the previous ones which were set up. ports_teardown: for (i = 0; i < port; i++) dsa_port_teardown(&ds->ports[i]); Initially I was tempted to fix this by extending the "for" loop to also cover the port that failed during setup. But this could have potentially unforeseen consequences unrelated to devlink_port or even other types of ports than user ports, which I can't really test for. For example, if for some reason devlink_port_register itself would fail, then unconditionally unregistering it in dsa_port_teardown would not be a smart idea. The list might go on. So just make dsa_port_setup undo the setup it had done upon failure, and let the for loop undo the work of setting up the previous ports, which are guaranteed to be brought up to a consistent state. Fixes: `955222ca52` ("net: dsa: use a single switch statement for port setup") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-02 11:59:29 -07:00
David S. Miller	765b7590c9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net r8152 conflicts are the NAPI fixes in 'net' overlapping with some tasklet stuff in net-next Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-02 11:20:17 -07:00
Linus Torvalds	345464fb76	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: 1) Fix some length checks during OGM processing in batman-adv, from Sven Eckelmann. 2) Fix regression that caused netfilter conntrack sysctls to not be per-netns any more. From Florian Westphal. 3) Use after free in netpoll, from Feng Sun. 4) Guard destruction of pfifo_fast per-cpu qdisc stats with qdisc_is_percpu_stats(), from Davide Caratti. Similar bug is fixed in pfifo_fast_enqueue(). 5) Fix memory leak in mld_del_delrec(), from Eric Dumazet. 6) Handle neigh events on internal ports correctly in nfp, from John Hurley. 7) Clear SKB timestamp in NF flow table code so that it does not confuse fq scheduler. From Florian Westphal. 8) taprio destroy can crash if it is invoked in a failure path of taprio_init(), because the list head isn't setup properly yet and the list del is unconditional. Perform the list add earlier to address this. From Vladimir Oltean. 9) Make sure to reapply vlan filters on device up, in aquantia driver. From Dmitry Bogdanov. 10) sgiseeq driver releases DMA memory using free_page() instead of dma_free_attrs(). From Christophe JAILLET. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits) net: seeq: Fix the function used to release some memory in an error handling path enetc: Add missing call to 'pci_free_irq_vectors()' in probe and remove functions net: bcmgenet: use ethtool_op_get_ts_info() tc-testing: don't hardcode 'ip' in nsPlugin.py net: dsa: microchip: add KSZ8563 compatibility string dt-bindings: net: dsa: document additional Microchip KSZ8563 switch net: aquantia: fix out of memory condition on rx side net: aquantia: linkstate irq should be oneshot net: aquantia: reapply vlan filters on up net: aquantia: fix limit of vlan filters net: aquantia: fix removal of vlan 0 net/sched: cbs: Set default link speed to 10 Mbps in cbs_set_port_rate taprio: Set default link speed to 10 Mbps in taprio_set_picos_per_byte taprio: Fix kernel panic in taprio_destroy net: dsa: microchip: fill regmap_config name rxrpc: Fix lack of conn cleanup when local endpoint is cleaned up [ver #2] net: stmmac: dwmac-rk: Don't fail if phy regulator is absent amd-xgbe: Fix error path in xgbe_mod_init() netfilter: nft_meta_bridge: Fix get NFT_META_BRI_IIFVPROTO in network byteorder mac80211: Correctly set noencrypt for PAE frames ...	2019-09-01 18:45:28 -07:00
Colin Ian King	56fcd40f8a	netlabel: remove redundant assignment to pointer iter Pointer iter is being initialized with a value that is never read and is being re-assigned a little later on. The assignment is redundant and hence can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-01 11:45:02 -07:00

1 2 3 4 5 ...

57782 Commits