OpenCloudOS-Kernel/net
Aaron Conole 69c47b3763 openvswitch: Set the skbuff pkt_type for proper pmtud support.
[ Upstream commit 30a92c9e3d6b073932762bef2ac66f4ee784c657 ]

Open vSwitch is originally intended to switch at layer 2, only dealing with
Ethernet frames.  With the introduction of l3 tunnels support, it crossed
into the realm of needing to care a bit about some routing details when
making forwarding decisions.  If an oversized packet would need to be
fragmented during this forwarding decision, there is a chance for pmtu
to get involved and generate a routing exception.  This is gated by the
skbuff->pkt_type field.

When a flow is already loaded into the openvswitch module this field is
set up and transitioned properly as a packet moves from one port to
another.  In the case that a packet execute is invoked after a flow is
newly installed this field is not properly initialized.  This causes the
pmtud mechanism to omit sending the required exception messages across
the tunnel boundary and a second attempt needs to be made to make sure
that the routing exception is properly setup.  To fix this, we set the
outgoing packet's pkt_type to PACKET_OUTGOING, since it can only get
to the openvswitch module via a port device or packet command.

Even for bridge ports as users, the pkt_type needs to be reset when
doing the transmit as the packet is truly outgoing and routing needs
to get involved post packet transformations, in the case of
VXLAN/GENEVE/udp-tunnel packets.  In general, the pkt_type on output
gets ignored, since we go straight to the driver, but in the case of
tunnel ports they go through IP routing layer.

This issue is periodically encountered in complex setups, such as large
openshift deployments, where multiple sets of tunnel traversal occurs.
A way to recreate this is with the ovn-heater project that can setup
a networking environment which mimics such large deployments.  We need
larger environments for this because we need to ensure that flow
misses occur.  In these environment, without this patch, we can see:

  ./ovn_cluster.sh start
  podman exec ovn-chassis-1 ip r a 170.168.0.5/32 dev eth1 mtu 1200
  podman exec ovn-chassis-1 ip netns exec sw01p1 ip r flush cache
  podman exec ovn-chassis-1 ip netns exec sw01p1 \
         ping 21.0.0.3 -M do -s 1300 -c2
  PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data.
  From 21.0.0.3 icmp_seq=2 Frag needed and DF set (mtu = 1142)

  --- 21.0.0.3 ping statistics ---
  ...

Using tcpdump, we can also see the expected ICMP FRAG_NEEDED message is not
sent into the server.

With this patch, setting the pkt_type, we see the following:

  podman exec ovn-chassis-1 ip netns exec sw01p1 \
         ping 21.0.0.3 -M do -s 1300 -c2
  PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data.
  From 21.0.0.3 icmp_seq=1 Frag needed and DF set (mtu = 1222)
  ping: local error: message too long, mtu=1222

  --- 21.0.0.3 ping statistics ---
  ...

In this case, the first ping request receives the FRAG_NEEDED message and
a local routing exception is created.

Tested-by: Jaime Caamano <jcaamano@redhat.com>
Reported-at: https://issues.redhat.com/browse/FDP-164
Fixes: 58264848a5 ("openvswitch: Add vxlan tunneling support.")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/20240516200941.16152-1-aconole@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12 11:12:49 +02:00
..
6lowpan 6lowpan: Remove redundant initialisation. 2023-03-29 08:22:52 +01:00
9p 9p: Fix read/write debug statements to report server reply 2024-04-10 16:35:57 +02:00
802
8021q net: gro: fix udp bad offset in socket lookup by adding {inner_}network_offset to napi_gro_cb 2024-05-17 12:02:07 +02:00
appletalk appletalk: Fix Use-After-Free in atalk_ioctl 2023-12-20 17:01:50 +01:00
atm atm: Fix Use-After-Free in do_vcc_ioctl 2023-12-20 17:01:48 +01:00
ax25 ax25: Fix reference count leak issue of net_device 2024-06-12 11:11:54 +02:00
batman-adv batman-adv: Avoid infinite loop trying to resize local TT 2024-04-17 11:19:25 +02:00
bluetooth Bluetooth: HCI: Remove HCI_AMP support 2024-06-12 11:11:55 +02:00
bpf bpf: Fix a few selftest failures due to llvm18 change 2024-02-05 20:14:20 +00:00
bpfilter net: Use umd_cleanup_helper() 2023-05-31 13:06:57 +02:00
bridge net: bridge: mst: fix vlan use-after-free 2024-06-12 11:12:12 +02:00
caif sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
can can: j1939: Fix UAF in j1939_sk_match_filter during setsockopt(SO_J1939_FILTER) 2024-02-23 09:25:17 +01:00
ceph libceph: fail sparse-read if the data length doesn't match 2024-03-01 13:34:56 +01:00
core net: give more chances to rcu in netdev_wait_allrefs_any() 2024-06-12 11:11:47 +02:00
dcb net: dcb: choose correct policy to parse DCB_ATTR_BCN 2023-08-01 21:07:46 -07:00
dccp dccp/tcp: Call security_inet_conn_request() after setting IPv6 addresses. 2023-11-20 11:59:35 +01:00
devlink devlink: fix port new reply cmd type 2024-03-26 18:20:11 -04:00
dns_resolver keys, dns: Fix size check of V1 server-list header 2024-01-25 15:35:41 -08:00
dsa net: dsa: mark parsed interface mode for legacy switch drivers 2023-08-09 13:08:09 -07:00
ethernet ethernet: Add helper for assigning packet type when dest address does not match device address 2024-05-02 16:32:46 +02:00
ethtool ethtool: netlink: Add missing ethnl_ops_begin/complete 2024-01-25 15:36:00 -08:00
handshake net/handshake: Fix handshake_req_destroy_test1 2024-02-23 09:24:50 +01:00
hsr hsr: Simplify code for announcing HSR nodes timer setup 2024-05-17 12:02:24 +02:00
ieee802154 sysctl-6.6-rc1 2023-08-29 17:39:15 -07:00
ife net: sched: ife: fix potential use-after-free 2024-01-01 12:42:30 +00:00
ipv4 tcp: Fix shift-out-of-bounds in dctcp_update_alpha(). 2024-06-12 11:12:48 +02:00
ipv6 ipv6: sr: fix memleak in seg6_hmac_init_algo 2024-06-12 11:12:48 +02:00
iucv net/iucv: fix the allocation size of iucv_path_table array 2024-03-26 18:19:12 -04:00
kcm net: kcm: fix incorrect parameter validation in the kcm_getsockopt) function 2024-03-26 18:19:40 -04:00
key Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-18 12:44:56 -07:00
l2tp net l2tp: drop flow hash on forward 2024-05-17 12:02:02 +02:00
l3mdev
lapb
llc llc: call sock_orphan() at release time 2024-02-05 20:14:36 +00:00
mac80211 wifi: mac80211: ensure beacon is non-S1G prior to extracting the beacon timestamp field 2024-06-12 11:11:22 +02:00
mac802154 mac802154: fix llsec key resources release in mac802154_llsec_key_del 2024-04-03 15:28:27 +02:00
mctp net: mctp: copy skb ext data when fragmenting 2024-03-26 18:19:34 -04:00
mpls net: mpls: error out if inner headers are not set 2024-04-13 13:07:41 +02:00
mptcp mptcp: SO_KEEPALIVE: fix getsockopt support 2024-06-12 11:11:54 +02:00
ncsi net/ncsi: Fix netlink major/minor version numbers 2024-01-25 15:35:20 -08:00
netfilter netfilter: nf_tables: honor table dormant flag from netdev release event path 2024-05-02 16:32:39 +02:00
netlabel calipso: fix memory leak in netlbl_calipso_add_pass() 2024-01-25 15:35:14 -08:00
netlink netlink: Fix kernel-infoleak-after-free in __skb_datagram_iter 2024-03-06 14:48:34 +00:00
netrom netrom: fix possible dead-lock in nr_rt_ioctl() 2024-06-12 11:12:12 +02:00
nfc nfc: nci: Fix uninit-value in nci_rx_work 2024-06-12 11:12:47 +02:00
nsh nsh: Restore skb->{protocol,data,mac_header} for outer header in nsh_gso_segment(). 2024-05-17 12:02:02 +02:00
openvswitch openvswitch: Set the skbuff pkt_type for proper pmtud support. 2024-06-12 11:12:49 +02:00
packet af_packet: do not call packet_read_pending() from tpacket_destruct_skb() 2024-06-12 11:12:12 +02:00
phonet phonet: fix rtm_phonet_notify() skb allocation 2024-05-17 12:02:22 +02:00
psample psample: Require 'CAP_NET_ADMIN' when joining "packets" group 2023-12-13 18:45:10 +01:00
qrtr net: qrtr: ns: Fix module refcnt 2024-06-12 11:12:12 +02:00
rds net/rds: fix possible cp null dereference 2024-04-10 16:35:49 +02:00
rfkill net: rfkill: gpio: set GPIO direction 2024-01-01 12:42:41 +00:00
rose net/rose: fix races in rose_kill_by_device() 2024-01-01 12:42:31 +00:00
rxrpc rxrpc: Only transmit one ACK per jumbo packet received 2024-05-17 12:02:23 +02:00
sched net/sched: fix lockdep splat in qdisc_tree_reduce_backlog() 2024-04-10 16:35:51 +02:00
sctp sctp: fix busy polling 2024-01-25 15:35:30 -08:00
smc net/smc: fix neighbour and rtable leak in smc_ib_find_route() 2024-05-17 12:02:24 +02:00
strparser
sunrpc rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL 2024-06-12 11:12:48 +02:00
switchdev net: bridge: switchdev: Skip MDB replays of deferred events on offload 2024-03-01 13:35:06 +01:00
tipc tipc: fix UAF in error path 2024-05-17 12:02:32 +02:00
tls tls: fix lockless read of strp->msg_ready in ->poll 2024-05-02 16:32:40 +02:00
unix af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue lock. 2024-06-12 11:12:48 +02:00
vmw_vsock vsock/virtio: fix packet delivery to tap device 2024-04-10 16:35:50 +02:00
wireless wifi: nl80211: Avoid address calculations via out of bounds array indexing 2024-06-12 11:11:48 +02:00
x25 net/x25: fix incorrect parameter validation in the x25_getsockopt() function 2024-03-26 18:19:41 -04:00
xdp xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING 2024-04-17 11:19:28 +02:00
xfrm xfrm: Preserve vlan tags for transport mode software GRO 2024-05-17 12:02:20 +02:00
Kconfig bpf: Add fd-based tcx multi-prog infra with link support 2023-07-19 10:07:27 -07:00
Kconfig.debug
Makefile net/handshake: Create a NETLINK service for handling handshake requests 2023-04-19 18:48:48 -07:00
compat.c net/compat: Update msg_control_is_user when setting a kernel pointer 2023-04-14 11:09:27 +01:00
devres.c
socket.c net: Save and restore msg_namelen in sock_sendmsg 2024-01-10 17:16:51 +01:00
sysctl_net.c sysctl: Add size to register_net_sysctl function 2023-08-15 15:26:17 -07:00