There is a race condition in vxlan that when deleting a vxlan device
during receiving packets, there is a possibility that the sock is
released after getting vxlan_sock vs from sk_user_data. Then in
later vxlan_ecn_decapsulate(), vxlan_get_sk_family() we will got
NULL pointer dereference. e.g.
#0 [ffffa25ec6978a38] machine_kexec at ffffffff8c669757
#1 [ffffa25ec6978a90] __crash_kexec at ffffffff8c7c0a4d
#2 [ffffa25ec6978b58] crash_kexec at ffffffff8c7c1c48
#3 [ffffa25ec6978b60] oops_end at ffffffff8c627f2b
#4 [ffffa25ec6978b80] page_fault_oops at ffffffff8c678fcb
#5 [ffffa25ec6978bd8] exc_page_fault at ffffffff8d109542
#6 [ffffa25ec6978c00] asm_exc_page_fault at ffffffff8d200b62
[exception RIP: vxlan_ecn_decapsulate+0x3b]
RIP: ffffffffc1014e7b RSP: ffffa25ec6978cb0 RFLAGS: 00010246
RAX: 0000000000000008 RBX: ffff8aa000888000 RCX: 0000000000000000
RDX: 000000000000000e RSI: ffff8a9fc7ab803e RDI: ffff8a9fd1168700
RBP: ffff8a9fc7ab803e R8: 0000000000700000 R9: 00000000000010ae
R10: ffff8a9fcb748980 R11: 0000000000000000 R12: ffff8a9fd1168700
R13: ffff8aa000888000 R14: 00000000002a0000 R15: 00000000000010ae
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffffa25ec6978ce8] vxlan_rcv at ffffffffc10189cd [vxlan]
#8 [ffffa25ec6978d90] udp_queue_rcv_one_skb at ffffffff8cfb6507
#9 [ffffa25ec6978dc0] udp_unicast_rcv_skb at ffffffff8cfb6e45
#10 [ffffa25ec6978dc8] __udp4_lib_rcv at ffffffff8cfb8807
#11 [ffffa25ec6978e20] ip_protocol_deliver_rcu at ffffffff8cf76951
#12 [ffffa25ec6978e48] ip_local_deliver at ffffffff8cf76bde
#13 [ffffa25ec6978ea0] __netif_receive_skb_one_core at ffffffff8cecde9b
#14 [ffffa25ec6978ec8] process_backlog at ffffffff8cece139
#15 [ffffa25ec6978f00] __napi_poll at ffffffff8ceced1a
#16 [ffffa25ec6978f28] net_rx_action at ffffffff8cecf1f3
#17 [ffffa25ec6978fa0] __softirqentry_text_start at ffffffff8d4000ca
#18 [ffffa25ec6978ff0] do_softirq at ffffffff8c6fbdc3
Reproducer: https://github.com/Mellanox/ovs-tests/blob/master/test-ovs-vxlan-remove-tunnel-during-traffic.sh
Fix this by waiting for all sk_user_data reader to finish before
releasing the sock.
Reported-by: Jianlin Shi <jishi@redhat.com>
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Fixes: 6a93cc9052 ("udp-tunnel: Add a few more UDP tunnel APIs")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because rxrpc pretends to be a tunnel on top of a UDP/UDP6 socket, allowing
it to siphon off UDP packets early in the handling of received UDP packets
thereby avoiding the packet going through the UDP receive queue, it doesn't
get ICMP packets through the UDP ->sk_error_report() callback. In fact, it
doesn't appear that there's any usable option for getting hold of ICMP
packets.
Fix this by adding a new UDP encap hook to distribute error messages for
UDP tunnels. If the hook is set, then the tunnel driver will be able to
see ICMP packets. The hook provides the offset into the packet of the UDP
header of the original packet that caused the notification.
An alternative would be to call the ->error_handler() hook - but that
requires that the skbuff be cloned (as ip_icmp_error() or ipv6_cmp_error()
do, though isn't really necessary or desirable in rxrpc's case is we want
to parse them there and then, not queue them).
Changes
=======
ver #3)
- Fixed an uninitialised variable.
ver #2)
- Fixed some missing CONFIG_AF_RXRPC_IPV6 conditionals.
Fixes: 5271953cad ("rxrpc: Use the UDP encap_rcv hook")
Signed-off-by: David Howells <dhowells@redhat.com>
udp_tunnel_core.c hasn't use any macro or function declared in udp.h, types.h,
and net_namespace.h. Thus, these files can be removed from udp_tunnel_core.c
safely without affecting the compilation of the net module.
Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the NETIF_F_RX_UDP_TUNNEL_PORT feature check into
udp_tunnel_nic_*_port() helpers, since they're always
done right before the call.
Add similar checks before calling the notifier.
udp_tunnel_nic invokes the notifier without checking
features which could result in some wasted cycles.
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
All drivers use udp_tunnel_nic_*_port() helpers, prepare for
NDO removal by invoking those helpers directly.
The helpers are safe to call on all devices, they check if
device has the UDP tunnel state initialized.
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cater to devices which:
(a) may want to sleep in the callbacks;
(b) only have IPv4 support;
(c) need all the programming to happen while the netdev is up.
Drivers attach UDP tunnel offload info struct to their netdevs,
where they declare how many UDP ports of various tunnel types
they support. Core takes care of tracking which ports to offload.
Use a fixed-size array since this matches what almost all drivers
do, and avoids a complexity and uncertainty around memory allocations
in an atomic context.
Make sure that tunnel drivers don't try to replay the ports when
new NIC netdev is registered. Automatic replays would mess up
reference counting, and will be removed completely once all drivers
are converted.
v4:
- use a #define NULL to avoid build issues with CONFIG_INET=n.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>