OpenCloudOS-Kernel/net/openvswitch
Aaron Conole 69c47b3763 openvswitch: Set the skbuff pkt_type for proper pmtud support.
[ Upstream commit 30a92c9e3d6b073932762bef2ac66f4ee784c657 ]

Open vSwitch is originally intended to switch at layer 2, only dealing with
Ethernet frames.  With the introduction of l3 tunnels support, it crossed
into the realm of needing to care a bit about some routing details when
making forwarding decisions.  If an oversized packet would need to be
fragmented during this forwarding decision, there is a chance for pmtu
to get involved and generate a routing exception.  This is gated by the
skbuff->pkt_type field.

When a flow is already loaded into the openvswitch module this field is
set up and transitioned properly as a packet moves from one port to
another.  In the case that a packet execute is invoked after a flow is
newly installed this field is not properly initialized.  This causes the
pmtud mechanism to omit sending the required exception messages across
the tunnel boundary and a second attempt needs to be made to make sure
that the routing exception is properly setup.  To fix this, we set the
outgoing packet's pkt_type to PACKET_OUTGOING, since it can only get
to the openvswitch module via a port device or packet command.

Even for bridge ports as users, the pkt_type needs to be reset when
doing the transmit as the packet is truly outgoing and routing needs
to get involved post packet transformations, in the case of
VXLAN/GENEVE/udp-tunnel packets.  In general, the pkt_type on output
gets ignored, since we go straight to the driver, but in the case of
tunnel ports they go through IP routing layer.

This issue is periodically encountered in complex setups, such as large
openshift deployments, where multiple sets of tunnel traversal occurs.
A way to recreate this is with the ovn-heater project that can setup
a networking environment which mimics such large deployments.  We need
larger environments for this because we need to ensure that flow
misses occur.  In these environment, without this patch, we can see:

  ./ovn_cluster.sh start
  podman exec ovn-chassis-1 ip r a 170.168.0.5/32 dev eth1 mtu 1200
  podman exec ovn-chassis-1 ip netns exec sw01p1 ip r flush cache
  podman exec ovn-chassis-1 ip netns exec sw01p1 \
         ping 21.0.0.3 -M do -s 1300 -c2
  PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data.
  From 21.0.0.3 icmp_seq=2 Frag needed and DF set (mtu = 1142)

  --- 21.0.0.3 ping statistics ---
  ...

Using tcpdump, we can also see the expected ICMP FRAG_NEEDED message is not
sent into the server.

With this patch, setting the pkt_type, we see the following:

  podman exec ovn-chassis-1 ip netns exec sw01p1 \
         ping 21.0.0.3 -M do -s 1300 -c2
  PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data.
  From 21.0.0.3 icmp_seq=1 Frag needed and DF set (mtu = 1222)
  ping: local error: message too long, mtu=1222

  --- 21.0.0.3 ping statistics ---
  ...

In this case, the first ping request receives the FRAG_NEEDED message and
a local routing exception is created.

Tested-by: Jaime Caamano <jcaamano@redhat.com>
Reported-at: https://issues.redhat.com/browse/FDP-164
Fixes: 58264848a5 ("openvswitch: Add vxlan tunneling support.")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/20240516200941.16152-1-aconole@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12 11:12:49 +02:00
..
Kconfig net: create nf_conntrack_ovs for ovs and tc use 2023-02-10 16:23:03 -08:00
Makefile openvswitch: add trace points 2021-06-22 10:47:32 -07:00
actions.c openvswitch: Set the skbuff pkt_type for proper pmtud support. 2024-06-12 11:12:49 +02:00
conntrack.c net: openvswitch: Fix Use-After-Free in ovs_ct_exit 2024-05-02 16:32:38 +02:00
conntrack.h net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct 2021-03-16 15:22:18 -07:00
datapath.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-18 12:44:56 -07:00
datapath.h net/sched: Enable tc skb ext allocation on chain miss only when needed 2022-02-05 10:12:53 +00:00
dp_notify.c net: openvswitch: use netif_ovs_is_port() instead of opencode 2019-07-08 15:53:25 -07:00
drop.h net: openvswitch: add misc error drop reasons 2023-08-14 08:01:06 +01:00
flow.c net: openvswitch: fix overwriting ct original tuple for ICMPv6 2024-06-12 11:11:52 +02:00
flow.h net: openvswitch: reduce cpu_used_mask memory 2023-02-06 22:36:29 -08:00
flow_netlink.c net: openvswitch: limit the number of recursions from action sets 2024-02-23 09:24:51 +01:00
flow_netlink.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 269 2019-06-05 17:30:29 +02:00
flow_table.c There is no particular theme here - mainly quick hits all over the tree. 2023-02-23 17:55:40 -08:00
flow_table.h net: openvswitch: fix to make sure flow_lookup() is not preempted 2020-10-18 12:29:36 -07:00
meter.c genetlink: remove userhdr from struct genl_info 2023-08-15 14:54:44 -07:00
meter.h net: openvswitch: use u64 for meter bucket 2020-04-23 18:26:11 -07:00
openvswitch_trace.c openvswitch: add trace points 2021-06-22 10:47:32 -07:00
openvswitch_trace.h openvswitch: add trace points 2021-06-22 10:47:32 -07:00
vport-geneve.c rtnetlink: Honour NLM_F_ECHO flag in rtnl_delete_link 2022-10-31 18:10:21 -07:00
vport-gre.c rtnetlink: Honour NLM_F_ECHO flag in rtnl_delete_link 2022-10-31 18:10:21 -07:00
vport-internal_dev.c openvswitch: Change the return type for vport_ops.send function hook to int 2022-09-19 18:28:50 -07:00
vport-internal_dev.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 269 2019-06-05 17:30:29 +02:00
vport-netdev.c rtnetlink: Honour NLM_F_ECHO flag in rtnl_delete_link 2022-10-31 18:10:21 -07:00
vport-netdev.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 269 2019-06-05 17:30:29 +02:00
vport-vxlan.c rtnetlink: Honour NLM_F_ECHO flag in rtnl_delete_link 2022-10-31 18:10:21 -07:00
vport.c net: openvswitch: fix upcall counter access before allocation 2023-06-07 12:25:05 +01:00
vport.h net: openvswitch: Add support to count upcall packets 2022-12-09 10:43:46 +00:00