Networking fixes for 6.4-rc2, including fixes from netfilter

Current release - regressions:
 
   - mtk_eth_soc: fix NULL pointer dereference
 
 Previous releases - regressions:
 
   - core:
     - skb_partial_csum_set() fix against transport header magic value
     - fix load-tearing on sk->sk_stamp in sock_recv_cmsgs().
     - annotate sk->sk_err write from do_recvmmsg()
     - add vlan_get_protocol_and_depth() helper
 
   - netlink: annotate accesses to nlk->cb_running
 
   - netfilter: always release netdev hooks from notifier
 
 Previous releases - always broken:
 
   - core: deal with most data-races in sk_wait_event()
 
   - netfilter: fix possible bug_on with enable_hooks=1
 
   - eth: bonding: fix send_peer_notif overflow
 
   - eth: xpcs: fix incorrect number of interfaces
 
   - eth: ipvlan: fix out-of-bounds caused by unclear skb->cb
 
   - eth: stmmac: Initialize MAC_ONEUS_TIC_COUNTER register
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmRcxawSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkv6wQAJgfOBlDAkZNKHzwtMuFiLxECeEMWY9h
 wJCyiq0qXnz9p5ZqjdmTmA8B+jUp9VkpgN5Z3lid5hXDfzDrvXL1KGZW4pc4ooz9
 GUzrp0EUzO5UsyrlZRS9vJ9mbCGN5M1ZWtWH93g8OzGJPRnLs0Q/Tr4IFTBVKzVb
 GmJPy/ZYWYDjnvx3BgewRDuYeH3Rt9lsIt4Pxq/E+D8W3ypvVM0m3GvrO5eEzMeu
 EfeilAdmJGJUufeoGguKt0hheqILS3kNCjQO25XS2Lq1OqetnR/wqTwXaaVxL2du
 Eb2ca7wKkihDpl2l8bQ3ss6vqM0HEpZ63Y2PJaNBS8ASdLsMq4n2L6j2JMfT8hWY
 RG3nJS7F2UFLyYmCJjNL1/I+Z9XeMyFKnHORzHK1dAkMlhd+8NauKWAxdjlxMbxX
 p1msyTl54bG0g6FrU/zAirCWNAAZYCPdZG/XvA/2Jj9mdy64OlGlv/QdJvfjcx+C
 L6nkwZfwXU7QUwKeeTfP8abte2SLrXIxkJrnNEAntPnFOSmd16+/yvQ8JVlbWTMd
 JugJrSAIxjOglIr/1fsnUuV+Ab+JDYQv/wkoyzvtcY2tjhTAHzgmTwwSfeYiCTJE
 rEbjyVvVgMcLTUIk/R9QC5/k6nX/7/KRDHxPOMBX4boOsuA0ARVjzt8uKRvv/7cS
 dRV98RwvCKvD
 =MoPD
 -----END PGP SIGNATURE-----

Merge tag 'net-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter.

  Current release - regressions:

   - mtk_eth_soc: fix NULL pointer dereference

  Previous releases - regressions:

   - core:
      - skb_partial_csum_set() fix against transport header magic value
      - fix load-tearing on sk->sk_stamp in sock_recv_cmsgs().
      - annotate sk->sk_err write from do_recvmmsg()
      - add vlan_get_protocol_and_depth() helper

   - netlink: annotate accesses to nlk->cb_running

   - netfilter: always release netdev hooks from notifier

  Previous releases - always broken:

   - core: deal with most data-races in sk_wait_event()

   - netfilter: fix possible bug_on with enable_hooks=1

   - eth: bonding: fix send_peer_notif overflow

   - eth: xpcs: fix incorrect number of interfaces

   - eth: ipvlan: fix out-of-bounds caused by unclear skb->cb

   - eth: stmmac: Initialize MAC_ONEUS_TIC_COUNTER register"

* tag 'net-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits)
  af_unix: Fix data races around sk->sk_shutdown.
  af_unix: Fix a data race of sk->sk_receive_queue->qlen.
  net: datagram: fix data-races in datagram_poll()
  net: mscc: ocelot: fix stat counter register values
  ipvlan:Fix out-of-bounds caused by unclear skb->cb
  docs: networking: fix x25-iface.rst heading & index order
  gve: Remove the code of clearing PBA bit
  tcp: add annotations around sk->sk_shutdown accesses
  net: add vlan_get_protocol_and_depth() helper
  net: pcs: xpcs: fix incorrect number of interfaces
  net: deal with most data-races in sk_wait_event()
  net: annotate sk->sk_err write from do_recvmmsg()
  netlink: annotate accesses to nlk->cb_running
  kselftest: bonding: add num_grat_arp test
  selftests: forwarding: lib: add netns support for tc rule handle stats get
  Documentation: bonding: fix the doc of peer_notif_delay
  bonding: fix send_peer_notif overflow
  net: ethernet: mtk_eth_soc: fix NULL pointer dereference
  selftests: nft_flowtable.sh: check ingress/egress chain too
  selftests: nft_flowtable.sh: monitor result file sizes
  ...
This commit is contained in:
Linus Torvalds 2023-05-11 08:42:47 -05:00
commit 6e27831b91
49 changed files with 361 additions and 112 deletions

View File

@ -776,10 +776,11 @@ peer_notif_delay
Specify the delay, in milliseconds, between each peer Specify the delay, in milliseconds, between each peer
notification (gratuitous ARP and unsolicited IPv6 Neighbor notification (gratuitous ARP and unsolicited IPv6 Neighbor
Advertisement) when they are issued after a failover event. Advertisement) when they are issued after a failover event.
This delay should be a multiple of the link monitor interval This delay should be a multiple of the MII link monitor interval
(arp_interval or miimon, whichever is active). The default (miimon).
value is 0 which means to match the value of the link monitor
interval. The valid range is 0 - 300000. The default value is 0, which means
to match the value of the MII link monitor interval.
prio prio
Slave priority. A higher number means higher priority. Slave priority. A higher number means higher priority.

View File

@ -116,8 +116,8 @@ Contents:
udplite udplite
vrf vrf
vxlan vxlan
x25-iface
x25 x25
x25-iface
xfrm_device xfrm_device
xfrm_proc xfrm_proc
xfrm_sync xfrm_sync

View File

@ -1,8 +1,7 @@
.. SPDX-License-Identifier: GPL-2.0 .. SPDX-License-Identifier: GPL-2.0
============================-
X.25 Device Driver Interface X.25 Device Driver Interface
============================- ============================
Version 1.1 Version 1.1

View File

@ -84,6 +84,11 @@ nla_put_failure:
return -EMSGSIZE; return -EMSGSIZE;
} }
/* Limit the max delay range to 300s */
static struct netlink_range_validation delay_range = {
.max = 300000,
};
static const struct nla_policy bond_policy[IFLA_BOND_MAX + 1] = { static const struct nla_policy bond_policy[IFLA_BOND_MAX + 1] = {
[IFLA_BOND_MODE] = { .type = NLA_U8 }, [IFLA_BOND_MODE] = { .type = NLA_U8 },
[IFLA_BOND_ACTIVE_SLAVE] = { .type = NLA_U32 }, [IFLA_BOND_ACTIVE_SLAVE] = { .type = NLA_U32 },
@ -114,7 +119,7 @@ static const struct nla_policy bond_policy[IFLA_BOND_MAX + 1] = {
[IFLA_BOND_AD_ACTOR_SYSTEM] = { .type = NLA_BINARY, [IFLA_BOND_AD_ACTOR_SYSTEM] = { .type = NLA_BINARY,
.len = ETH_ALEN }, .len = ETH_ALEN },
[IFLA_BOND_TLB_DYNAMIC_LB] = { .type = NLA_U8 }, [IFLA_BOND_TLB_DYNAMIC_LB] = { .type = NLA_U8 },
[IFLA_BOND_PEER_NOTIF_DELAY] = { .type = NLA_U32 }, [IFLA_BOND_PEER_NOTIF_DELAY] = NLA_POLICY_FULL_RANGE(NLA_U32, &delay_range),
[IFLA_BOND_MISSED_MAX] = { .type = NLA_U8 }, [IFLA_BOND_MISSED_MAX] = { .type = NLA_U8 },
[IFLA_BOND_NS_IP6_TARGET] = { .type = NLA_NESTED }, [IFLA_BOND_NS_IP6_TARGET] = { .type = NLA_NESTED },
}; };

View File

@ -169,6 +169,12 @@ static const struct bond_opt_value bond_num_peer_notif_tbl[] = {
{ NULL, -1, 0} { NULL, -1, 0}
}; };
static const struct bond_opt_value bond_peer_notif_delay_tbl[] = {
{ "off", 0, 0},
{ "maxval", 300000, BOND_VALFLAG_MAX},
{ NULL, -1, 0}
};
static const struct bond_opt_value bond_primary_reselect_tbl[] = { static const struct bond_opt_value bond_primary_reselect_tbl[] = {
{ "always", BOND_PRI_RESELECT_ALWAYS, BOND_VALFLAG_DEFAULT}, { "always", BOND_PRI_RESELECT_ALWAYS, BOND_VALFLAG_DEFAULT},
{ "better", BOND_PRI_RESELECT_BETTER, 0}, { "better", BOND_PRI_RESELECT_BETTER, 0},
@ -488,7 +494,7 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = {
.id = BOND_OPT_PEER_NOTIF_DELAY, .id = BOND_OPT_PEER_NOTIF_DELAY,
.name = "peer_notif_delay", .name = "peer_notif_delay",
.desc = "Delay between each peer notification on failover event, in milliseconds", .desc = "Delay between each peer notification on failover event, in milliseconds",
.values = bond_intmax_tbl, .values = bond_peer_notif_delay_tbl,
.set = bond_option_peer_notif_delay_set .set = bond_option_peer_notif_delay_set
} }
}; };

View File

@ -294,19 +294,6 @@ static int gve_napi_poll_dqo(struct napi_struct *napi, int budget)
bool reschedule = false; bool reschedule = false;
int work_done = 0; int work_done = 0;
/* Clear PCI MSI-X Pending Bit Array (PBA)
*
* This bit is set if an interrupt event occurs while the vector is
* masked. If this bit is set and we reenable the interrupt, it will
* fire again. Since we're just about to poll the queue state, we don't
* need it to fire again.
*
* Under high softirq load, it's possible that the interrupt condition
* is triggered twice before we got the chance to process it.
*/
gve_write_irq_doorbell_dqo(priv, block,
GVE_ITR_NO_UPDATE_DQO | GVE_ITR_CLEAR_PBA_BIT_DQO);
if (block->tx) if (block->tx)
reschedule |= gve_tx_poll_dqo(block, /*do_clean=*/true); reschedule |= gve_tx_poll_dqo(block, /*do_clean=*/true);

View File

@ -654,7 +654,7 @@ __mtk_wed_detach(struct mtk_wed_device *dev)
BIT(hw->index), BIT(hw->index)); BIT(hw->index), BIT(hw->index));
} }
if (!hw_list[!hw->index]->wed_dev && if ((!hw_list[!hw->index] || !hw_list[!hw->index]->wed_dev) &&
hw->eth->dma_dev != hw->eth->dev) hw->eth->dma_dev != hw->eth->dev)
mtk_eth_set_dma_device(hw->eth, hw->eth->dev); mtk_eth_set_dma_device(hw->eth, hw->eth->dev);

View File

@ -307,15 +307,15 @@ static const u32 vsc7514_sys_regmap[] = {
REG(SYS_COUNT_DROP_YELLOW_PRIO_4, 0x000218), REG(SYS_COUNT_DROP_YELLOW_PRIO_4, 0x000218),
REG(SYS_COUNT_DROP_YELLOW_PRIO_5, 0x00021c), REG(SYS_COUNT_DROP_YELLOW_PRIO_5, 0x00021c),
REG(SYS_COUNT_DROP_YELLOW_PRIO_6, 0x000220), REG(SYS_COUNT_DROP_YELLOW_PRIO_6, 0x000220),
REG(SYS_COUNT_DROP_YELLOW_PRIO_7, 0x000214), REG(SYS_COUNT_DROP_YELLOW_PRIO_7, 0x000224),
REG(SYS_COUNT_DROP_GREEN_PRIO_0, 0x000218), REG(SYS_COUNT_DROP_GREEN_PRIO_0, 0x000228),
REG(SYS_COUNT_DROP_GREEN_PRIO_1, 0x00021c), REG(SYS_COUNT_DROP_GREEN_PRIO_1, 0x00022c),
REG(SYS_COUNT_DROP_GREEN_PRIO_2, 0x000220), REG(SYS_COUNT_DROP_GREEN_PRIO_2, 0x000230),
REG(SYS_COUNT_DROP_GREEN_PRIO_3, 0x000224), REG(SYS_COUNT_DROP_GREEN_PRIO_3, 0x000234),
REG(SYS_COUNT_DROP_GREEN_PRIO_4, 0x000228), REG(SYS_COUNT_DROP_GREEN_PRIO_4, 0x000238),
REG(SYS_COUNT_DROP_GREEN_PRIO_5, 0x00022c), REG(SYS_COUNT_DROP_GREEN_PRIO_5, 0x00023c),
REG(SYS_COUNT_DROP_GREEN_PRIO_6, 0x000230), REG(SYS_COUNT_DROP_GREEN_PRIO_6, 0x000240),
REG(SYS_COUNT_DROP_GREEN_PRIO_7, 0x000234), REG(SYS_COUNT_DROP_GREEN_PRIO_7, 0x000244),
REG(SYS_RESET_CFG, 0x000508), REG(SYS_RESET_CFG, 0x000508),
REG(SYS_CMID, 0x00050c), REG(SYS_CMID, 0x00050c),
REG(SYS_VLAN_ETYPE_CFG, 0x000510), REG(SYS_VLAN_ETYPE_CFG, 0x000510),

View File

@ -181,6 +181,7 @@ enum power_event {
#define GMAC4_LPI_CTRL_STATUS 0xd0 #define GMAC4_LPI_CTRL_STATUS 0xd0
#define GMAC4_LPI_TIMER_CTRL 0xd4 #define GMAC4_LPI_TIMER_CTRL 0xd4
#define GMAC4_LPI_ENTRY_TIMER 0xd8 #define GMAC4_LPI_ENTRY_TIMER 0xd8
#define GMAC4_MAC_ONEUS_TIC_COUNTER 0xdc
/* LPI control and status defines */ /* LPI control and status defines */
#define GMAC4_LPI_CTRL_STATUS_LPITCSE BIT(21) /* LPI Tx Clock Stop Enable */ #define GMAC4_LPI_CTRL_STATUS_LPITCSE BIT(21) /* LPI Tx Clock Stop Enable */

View File

@ -25,6 +25,7 @@ static void dwmac4_core_init(struct mac_device_info *hw,
struct stmmac_priv *priv = netdev_priv(dev); struct stmmac_priv *priv = netdev_priv(dev);
void __iomem *ioaddr = hw->pcsr; void __iomem *ioaddr = hw->pcsr;
u32 value = readl(ioaddr + GMAC_CONFIG); u32 value = readl(ioaddr + GMAC_CONFIG);
u32 clk_rate;
value |= GMAC_CORE_INIT; value |= GMAC_CORE_INIT;
@ -47,6 +48,10 @@ static void dwmac4_core_init(struct mac_device_info *hw,
writel(value, ioaddr + GMAC_CONFIG); writel(value, ioaddr + GMAC_CONFIG);
/* Configure LPI 1us counter to number of CSR clock ticks in 1us - 1 */
clk_rate = clk_get_rate(priv->plat->stmmac_clk);
writel((clk_rate / 1000000) - 1, ioaddr + GMAC4_MAC_ONEUS_TIC_COUNTER);
/* Enable GMAC interrupts */ /* Enable GMAC interrupts */
value = GMAC_INT_DEFAULT_ENABLE; value = GMAC_INT_DEFAULT_ENABLE;

View File

@ -436,6 +436,9 @@ static int ipvlan_process_v4_outbound(struct sk_buff *skb)
goto err; goto err;
} }
skb_dst_set(skb, &rt->dst); skb_dst_set(skb, &rt->dst);
memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
err = ip_local_out(net, skb->sk, skb); err = ip_local_out(net, skb->sk, skb);
if (unlikely(net_xmit_eval(err))) if (unlikely(net_xmit_eval(err)))
dev->stats.tx_errors++; dev->stats.tx_errors++;
@ -474,6 +477,9 @@ static int ipvlan_process_v6_outbound(struct sk_buff *skb)
goto err; goto err;
} }
skb_dst_set(skb, dst); skb_dst_set(skb, dst);
memset(IP6CB(skb), 0, sizeof(*IP6CB(skb)));
err = ip6_local_out(net, skb->sk, skb); err = ip6_local_out(net, skb->sk, skb);
if (unlikely(net_xmit_eval(err))) if (unlikely(net_xmit_eval(err)))
dev->stats.tx_errors++; dev->stats.tx_errors++;

View File

@ -67,6 +67,7 @@ static int mvusb_mdio_probe(struct usb_interface *interface,
struct device *dev = &interface->dev; struct device *dev = &interface->dev;
struct mvusb_mdio *mvusb; struct mvusb_mdio *mvusb;
struct mii_bus *mdio; struct mii_bus *mdio;
int ret;
mdio = devm_mdiobus_alloc_size(dev, sizeof(*mvusb)); mdio = devm_mdiobus_alloc_size(dev, sizeof(*mvusb));
if (!mdio) if (!mdio)
@ -87,7 +88,15 @@ static int mvusb_mdio_probe(struct usb_interface *interface,
mdio->write = mvusb_mdio_write; mdio->write = mvusb_mdio_write;
usb_set_intfdata(interface, mvusb); usb_set_intfdata(interface, mvusb);
return of_mdiobus_register(mdio, dev->of_node); ret = of_mdiobus_register(mdio, dev->of_node);
if (ret)
goto put_dev;
return 0;
put_dev:
usb_put_dev(mvusb->udev);
return ret;
} }
static void mvusb_mdio_disconnect(struct usb_interface *interface) static void mvusb_mdio_disconnect(struct usb_interface *interface)

View File

@ -1203,7 +1203,7 @@ static const struct xpcs_compat synopsys_xpcs_compat[DW_XPCS_INTERFACE_MAX] = {
[DW_XPCS_2500BASEX] = { [DW_XPCS_2500BASEX] = {
.supported = xpcs_2500basex_features, .supported = xpcs_2500basex_features,
.interface = xpcs_2500basex_interfaces, .interface = xpcs_2500basex_interfaces,
.num_interfaces = ARRAY_SIZE(xpcs_2500basex_features), .num_interfaces = ARRAY_SIZE(xpcs_2500basex_interfaces),
.an_mode = DW_2500BASEX, .an_mode = DW_2500BASEX,
}, },
}; };

View File

@ -40,6 +40,11 @@ static inline int bcm_phy_write_exp_sel(struct phy_device *phydev,
return bcm_phy_write_exp(phydev, reg | MII_BCM54XX_EXP_SEL_ER, val); return bcm_phy_write_exp(phydev, reg | MII_BCM54XX_EXP_SEL_ER, val);
} }
static inline int bcm_phy_read_exp_sel(struct phy_device *phydev, u16 reg)
{
return bcm_phy_read_exp(phydev, reg | MII_BCM54XX_EXP_SEL_ER);
}
int bcm54xx_auxctl_write(struct phy_device *phydev, u16 regnum, u16 val); int bcm54xx_auxctl_write(struct phy_device *phydev, u16 regnum, u16 val);
int bcm54xx_auxctl_read(struct phy_device *phydev, u16 regnum); int bcm54xx_auxctl_read(struct phy_device *phydev, u16 regnum);

View File

@ -486,7 +486,7 @@ static int bcm7xxx_16nm_ephy_afe_config(struct phy_device *phydev)
bcm_phy_write_misc(phydev, 0x0038, 0x0002, 0xede0); bcm_phy_write_misc(phydev, 0x0038, 0x0002, 0xede0);
/* Read CORE_EXPA9 */ /* Read CORE_EXPA9 */
tmp = bcm_phy_read_exp(phydev, 0x00a9); tmp = bcm_phy_read_exp_sel(phydev, 0x00a9);
/* CORE_EXPA9[6:1] is rcalcode[5:0] */ /* CORE_EXPA9[6:1] is rcalcode[5:0] */
rcalcode = (tmp & 0x7e) / 2; rcalcode = (tmp & 0x7e) / 2;
/* Correct RCAL code + 1 is -1% rprogr, LP: +16 */ /* Correct RCAL code + 1 is -1% rprogr, LP: +16 */

View File

@ -742,7 +742,7 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
/* Move network header to the right position for VLAN tagged packets */ /* Move network header to the right position for VLAN tagged packets */
if (eth_type_vlan(skb->protocol) && if (eth_type_vlan(skb->protocol) &&
__vlan_get_protocol(skb, skb->protocol, &depth) != 0) vlan_get_protocol_and_depth(skb, skb->protocol, &depth) != 0)
skb_set_network_header(skb, depth); skb_set_network_header(skb, depth);
/* copy skb_ubuf_info for callback when skb has no error */ /* copy skb_ubuf_info for callback when skb has no error */
@ -1197,7 +1197,7 @@ static int tap_get_user_xdp(struct tap_queue *q, struct xdp_buff *xdp)
/* Move network header to the right position for VLAN tagged packets */ /* Move network header to the right position for VLAN tagged packets */
if (eth_type_vlan(skb->protocol) && if (eth_type_vlan(skb->protocol) &&
__vlan_get_protocol(skb, skb->protocol, &depth) != 0) vlan_get_protocol_and_depth(skb, skb->protocol, &depth) != 0)
skb_set_network_header(skb, depth); skb_set_network_header(skb, depth);
rcu_read_lock(); rcu_read_lock();

View File

@ -236,8 +236,9 @@ void dim_park_tired(struct dim *dim);
* *
* Calculate the delta between two samples (in data rates). * Calculate the delta between two samples (in data rates).
* Takes into consideration counter wrap-around. * Takes into consideration counter wrap-around.
* Returned boolean indicates whether curr_stats are reliable.
*/ */
void dim_calc_stats(struct dim_sample *start, struct dim_sample *end, bool dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
struct dim_stats *curr_stats); struct dim_stats *curr_stats);
/** /**

View File

@ -637,6 +637,23 @@ static inline __be16 vlan_get_protocol(const struct sk_buff *skb)
return __vlan_get_protocol(skb, skb->protocol, NULL); return __vlan_get_protocol(skb, skb->protocol, NULL);
} }
/* This version of __vlan_get_protocol() also pulls mac header in skb->head */
static inline __be16 vlan_get_protocol_and_depth(struct sk_buff *skb,
__be16 type, int *depth)
{
int maclen;
type = __vlan_get_protocol(skb, type, &maclen);
if (type) {
if (!pskb_may_pull(skb, maclen))
type = 0;
else if (depth)
*depth = maclen;
}
return type;
}
/* A getter for the SKB protocol field which will handle VLAN tags consistently /* A getter for the SKB protocol field which will handle VLAN tags consistently
* whether VLAN acceleration is enabled or not. * whether VLAN acceleration is enabled or not.
*/ */

View File

@ -233,7 +233,7 @@ struct bonding {
*/ */
spinlock_t mode_lock; spinlock_t mode_lock;
spinlock_t stats_lock; spinlock_t stats_lock;
u8 send_peer_notif; u32 send_peer_notif;
u8 igmp_retrans; u8 igmp_retrans;
#ifdef CONFIG_PROC_FS #ifdef CONFIG_PROC_FS
struct proc_dir_entry *proc_entry; struct proc_dir_entry *proc_entry;

View File

@ -2718,7 +2718,7 @@ static inline void sock_recv_cmsgs(struct msghdr *msg, struct sock *sk,
__sock_recv_cmsgs(msg, sk, skb); __sock_recv_cmsgs(msg, sk, skb);
else if (unlikely(sock_flag(sk, SOCK_TIMESTAMP))) else if (unlikely(sock_flag(sk, SOCK_TIMESTAMP)))
sock_write_timestamp(sk, skb->tstamp); sock_write_timestamp(sk, skb->tstamp);
else if (unlikely(sk->sk_stamp == SK_DEFAULT_STAMP)) else if (unlikely(sock_read_timestamp(sk) == SK_DEFAULT_STAMP))
sock_write_timestamp(sk, 0); sock_write_timestamp(sk, 0);
} }

View File

@ -54,7 +54,7 @@ void dim_park_tired(struct dim *dim)
} }
EXPORT_SYMBOL(dim_park_tired); EXPORT_SYMBOL(dim_park_tired);
void dim_calc_stats(struct dim_sample *start, struct dim_sample *end, bool dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
struct dim_stats *curr_stats) struct dim_stats *curr_stats)
{ {
/* u32 holds up to 71 minutes, should be enough */ /* u32 holds up to 71 minutes, should be enough */
@ -66,7 +66,7 @@ void dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
start->comp_ctr); start->comp_ctr);
if (!delta_us) if (!delta_us)
return; return false;
curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us); curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us);
curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us); curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
@ -79,5 +79,6 @@ void dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
else else
curr_stats->cpe_ratio = 0; curr_stats->cpe_ratio = 0;
return true;
} }
EXPORT_SYMBOL(dim_calc_stats); EXPORT_SYMBOL(dim_calc_stats);

View File

@ -227,7 +227,8 @@ void net_dim(struct dim *dim, struct dim_sample end_sample)
dim->start_sample.event_ctr); dim->start_sample.event_ctr);
if (nevents < DIM_NEVENTS) if (nevents < DIM_NEVENTS)
break; break;
dim_calc_stats(&dim->start_sample, &end_sample, &curr_stats); if (!dim_calc_stats(&dim->start_sample, &end_sample, &curr_stats))
break;
if (net_dim_decision(&curr_stats, dim)) { if (net_dim_decision(&curr_stats, dim)) {
dim->state = DIM_APPLY_NEW_PROFILE; dim->state = DIM_APPLY_NEW_PROFILE;
schedule_work(&dim->work); schedule_work(&dim->work);

View File

@ -88,7 +88,8 @@ void rdma_dim(struct dim *dim, u64 completions)
nevents = curr_sample->event_ctr - dim->start_sample.event_ctr; nevents = curr_sample->event_ctr - dim->start_sample.event_ctr;
if (nevents < DIM_NEVENTS) if (nevents < DIM_NEVENTS)
break; break;
dim_calc_stats(&dim->start_sample, curr_sample, &curr_stats); if (!dim_calc_stats(&dim->start_sample, curr_sample, &curr_stats))
break;
if (rdma_dim_decision(&curr_stats, dim)) { if (rdma_dim_decision(&curr_stats, dim)) {
dim->state = DIM_APPLY_NEW_PROFILE; dim->state = DIM_APPLY_NEW_PROFILE;
schedule_work(&dim->work); schedule_work(&dim->work);

View File

@ -42,7 +42,7 @@ int br_dev_queue_push_xmit(struct net *net, struct sock *sk, struct sk_buff *skb
eth_type_vlan(skb->protocol)) { eth_type_vlan(skb->protocol)) {
int depth; int depth;
if (!__vlan_get_protocol(skb, skb->protocol, &depth)) if (!vlan_get_protocol_and_depth(skb, skb->protocol, &depth))
goto drop; goto drop;
skb_set_network_header(skb, depth); skb_set_network_header(skb, depth);

View File

@ -807,18 +807,21 @@ __poll_t datagram_poll(struct file *file, struct socket *sock,
{ {
struct sock *sk = sock->sk; struct sock *sk = sock->sk;
__poll_t mask; __poll_t mask;
u8 shutdown;
sock_poll_wait(file, sock, wait); sock_poll_wait(file, sock, wait);
mask = 0; mask = 0;
/* exceptional events? */ /* exceptional events? */
if (sk->sk_err || !skb_queue_empty_lockless(&sk->sk_error_queue)) if (READ_ONCE(sk->sk_err) ||
!skb_queue_empty_lockless(&sk->sk_error_queue))
mask |= EPOLLERR | mask |= EPOLLERR |
(sock_flag(sk, SOCK_SELECT_ERR_QUEUE) ? EPOLLPRI : 0); (sock_flag(sk, SOCK_SELECT_ERR_QUEUE) ? EPOLLPRI : 0);
if (sk->sk_shutdown & RCV_SHUTDOWN) shutdown = READ_ONCE(sk->sk_shutdown);
if (shutdown & RCV_SHUTDOWN)
mask |= EPOLLRDHUP | EPOLLIN | EPOLLRDNORM; mask |= EPOLLRDHUP | EPOLLIN | EPOLLRDNORM;
if (sk->sk_shutdown == SHUTDOWN_MASK) if (shutdown == SHUTDOWN_MASK)
mask |= EPOLLHUP; mask |= EPOLLHUP;
/* readable? */ /* readable? */
@ -827,10 +830,12 @@ __poll_t datagram_poll(struct file *file, struct socket *sock,
/* Connection-based need to check for termination and startup */ /* Connection-based need to check for termination and startup */
if (connection_based(sk)) { if (connection_based(sk)) {
if (sk->sk_state == TCP_CLOSE) int state = READ_ONCE(sk->sk_state);
if (state == TCP_CLOSE)
mask |= EPOLLHUP; mask |= EPOLLHUP;
/* connection hasn't started yet? */ /* connection hasn't started yet? */
if (sk->sk_state == TCP_SYN_SENT) if (state == TCP_SYN_SENT)
return mask; return mask;
} }

View File

@ -3335,7 +3335,7 @@ __be16 skb_network_protocol(struct sk_buff *skb, int *depth)
type = eth->h_proto; type = eth->h_proto;
} }
return __vlan_get_protocol(skb, type, depth); return vlan_get_protocol_and_depth(skb, type, depth);
} }
/* openvswitch calls this on rx path, so we need a different check. /* openvswitch calls this on rx path, so we need a different check.

View File

@ -5298,7 +5298,7 @@ bool skb_partial_csum_set(struct sk_buff *skb, u16 start, u16 off)
u32 csum_end = (u32)start + (u32)off + sizeof(__sum16); u32 csum_end = (u32)start + (u32)off + sizeof(__sum16);
u32 csum_start = skb_headroom(skb) + (u32)start; u32 csum_start = skb_headroom(skb) + (u32)start;
if (unlikely(csum_start > U16_MAX || csum_end > skb_headlen(skb))) { if (unlikely(csum_start >= U16_MAX || csum_end > skb_headlen(skb))) {
net_warn_ratelimited("bad partial csum: csum=%u/%u headroom=%u headlen=%u\n", net_warn_ratelimited("bad partial csum: csum=%u/%u headroom=%u headlen=%u\n",
start, off, skb_headroom(skb), skb_headlen(skb)); start, off, skb_headroom(skb), skb_headlen(skb));
return false; return false;
@ -5306,7 +5306,7 @@ bool skb_partial_csum_set(struct sk_buff *skb, u16 start, u16 off)
skb->ip_summed = CHECKSUM_PARTIAL; skb->ip_summed = CHECKSUM_PARTIAL;
skb->csum_start = csum_start; skb->csum_start = csum_start;
skb->csum_offset = off; skb->csum_offset = off;
skb_set_transport_header(skb, start); skb->transport_header = csum_start;
return true; return true;
} }
EXPORT_SYMBOL_GPL(skb_partial_csum_set); EXPORT_SYMBOL_GPL(skb_partial_csum_set);

View File

@ -73,8 +73,8 @@ int sk_stream_wait_connect(struct sock *sk, long *timeo_p)
add_wait_queue(sk_sleep(sk), &wait); add_wait_queue(sk_sleep(sk), &wait);
sk->sk_write_pending++; sk->sk_write_pending++;
done = sk_wait_event(sk, timeo_p, done = sk_wait_event(sk, timeo_p,
!sk->sk_err && !READ_ONCE(sk->sk_err) &&
!((1 << sk->sk_state) & !((1 << READ_ONCE(sk->sk_state)) &
~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)), &wait); ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)), &wait);
remove_wait_queue(sk_sleep(sk), &wait); remove_wait_queue(sk_sleep(sk), &wait);
sk->sk_write_pending--; sk->sk_write_pending--;
@ -87,9 +87,9 @@ EXPORT_SYMBOL(sk_stream_wait_connect);
* sk_stream_closing - Return 1 if we still have things to send in our buffers. * sk_stream_closing - Return 1 if we still have things to send in our buffers.
* @sk: socket to verify * @sk: socket to verify
*/ */
static inline int sk_stream_closing(struct sock *sk) static int sk_stream_closing(const struct sock *sk)
{ {
return (1 << sk->sk_state) & return (1 << READ_ONCE(sk->sk_state)) &
(TCPF_FIN_WAIT1 | TCPF_CLOSING | TCPF_LAST_ACK); (TCPF_FIN_WAIT1 | TCPF_CLOSING | TCPF_LAST_ACK);
} }
@ -142,8 +142,8 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p)
set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
sk->sk_write_pending++; sk->sk_write_pending++;
sk_wait_event(sk, &current_timeo, sk->sk_err || sk_wait_event(sk, &current_timeo, READ_ONCE(sk->sk_err) ||
(sk->sk_shutdown & SEND_SHUTDOWN) || (READ_ONCE(sk->sk_shutdown) & SEND_SHUTDOWN) ||
(sk_stream_memory_free(sk) && (sk_stream_memory_free(sk) &&
!vm_wait), &wait); !vm_wait), &wait);
sk->sk_write_pending--; sk->sk_write_pending--;

View File

@ -894,7 +894,7 @@ int inet_shutdown(struct socket *sock, int how)
EPOLLHUP, even on eg. unconnected UDP sockets -- RR */ EPOLLHUP, even on eg. unconnected UDP sockets -- RR */
fallthrough; fallthrough;
default: default:
sk->sk_shutdown |= how; WRITE_ONCE(sk->sk_shutdown, sk->sk_shutdown | how);
if (sk->sk_prot->shutdown) if (sk->sk_prot->shutdown)
sk->sk_prot->shutdown(sk, how); sk->sk_prot->shutdown(sk, how);
break; break;

View File

@ -498,6 +498,7 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
__poll_t mask; __poll_t mask;
struct sock *sk = sock->sk; struct sock *sk = sock->sk;
const struct tcp_sock *tp = tcp_sk(sk); const struct tcp_sock *tp = tcp_sk(sk);
u8 shutdown;
int state; int state;
sock_poll_wait(file, sock, wait); sock_poll_wait(file, sock, wait);
@ -540,9 +541,10 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
* NOTE. Check for TCP_CLOSE is added. The goal is to prevent * NOTE. Check for TCP_CLOSE is added. The goal is to prevent
* blocking on fresh not-connected or disconnected socket. --ANK * blocking on fresh not-connected or disconnected socket. --ANK
*/ */
if (sk->sk_shutdown == SHUTDOWN_MASK || state == TCP_CLOSE) shutdown = READ_ONCE(sk->sk_shutdown);
if (shutdown == SHUTDOWN_MASK || state == TCP_CLOSE)
mask |= EPOLLHUP; mask |= EPOLLHUP;
if (sk->sk_shutdown & RCV_SHUTDOWN) if (shutdown & RCV_SHUTDOWN)
mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP; mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP;
/* Connected or passive Fast Open socket? */ /* Connected or passive Fast Open socket? */
@ -559,7 +561,7 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
if (tcp_stream_is_readable(sk, target)) if (tcp_stream_is_readable(sk, target))
mask |= EPOLLIN | EPOLLRDNORM; mask |= EPOLLIN | EPOLLRDNORM;
if (!(sk->sk_shutdown & SEND_SHUTDOWN)) { if (!(shutdown & SEND_SHUTDOWN)) {
if (__sk_stream_is_writeable(sk, 1)) { if (__sk_stream_is_writeable(sk, 1)) {
mask |= EPOLLOUT | EPOLLWRNORM; mask |= EPOLLOUT | EPOLLWRNORM;
} else { /* send SIGIO later */ } else { /* send SIGIO later */
@ -2867,7 +2869,7 @@ void __tcp_close(struct sock *sk, long timeout)
int data_was_unread = 0; int data_was_unread = 0;
int state; int state;
sk->sk_shutdown = SHUTDOWN_MASK; WRITE_ONCE(sk->sk_shutdown, SHUTDOWN_MASK);
if (sk->sk_state == TCP_LISTEN) { if (sk->sk_state == TCP_LISTEN) {
tcp_set_state(sk, TCP_CLOSE); tcp_set_state(sk, TCP_CLOSE);
@ -3119,7 +3121,7 @@ int tcp_disconnect(struct sock *sk, int flags)
inet_bhash2_reset_saddr(sk); inet_bhash2_reset_saddr(sk);
sk->sk_shutdown = 0; WRITE_ONCE(sk->sk_shutdown, 0);
sock_reset_flag(sk, SOCK_DONE); sock_reset_flag(sk, SOCK_DONE);
tp->srtt_us = 0; tp->srtt_us = 0;
tp->mdev_us = jiffies_to_usecs(TCP_TIMEOUT_INIT); tp->mdev_us = jiffies_to_usecs(TCP_TIMEOUT_INIT);
@ -4649,7 +4651,7 @@ void tcp_done(struct sock *sk)
if (req) if (req)
reqsk_fastopen_remove(sk, req, false); reqsk_fastopen_remove(sk, req, false);
sk->sk_shutdown = SHUTDOWN_MASK; WRITE_ONCE(sk->sk_shutdown, SHUTDOWN_MASK);
if (!sock_flag(sk, SOCK_DEAD)) if (!sock_flag(sk, SOCK_DEAD))
sk->sk_state_change(sk); sk->sk_state_change(sk);

View File

@ -168,7 +168,7 @@ static int tcp_msg_wait_data(struct sock *sk, struct sk_psock *psock,
sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk); sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
ret = sk_wait_event(sk, &timeo, ret = sk_wait_event(sk, &timeo,
!list_empty(&psock->ingress_msg) || !list_empty(&psock->ingress_msg) ||
!skb_queue_empty(&sk->sk_receive_queue), &wait); !skb_queue_empty_lockless(&sk->sk_receive_queue), &wait);
sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk); sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk);
remove_wait_queue(sk_sleep(sk), &wait); remove_wait_queue(sk_sleep(sk), &wait);
return ret; return ret;

View File

@ -4362,7 +4362,7 @@ void tcp_fin(struct sock *sk)
inet_csk_schedule_ack(sk); inet_csk_schedule_ack(sk);
sk->sk_shutdown |= RCV_SHUTDOWN; WRITE_ONCE(sk->sk_shutdown, sk->sk_shutdown | RCV_SHUTDOWN);
sock_set_flag(sk, SOCK_DONE); sock_set_flag(sk, SOCK_DONE);
switch (sk->sk_state) { switch (sk->sk_state) {
@ -6599,7 +6599,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
break; break;
tcp_set_state(sk, TCP_FIN_WAIT2); tcp_set_state(sk, TCP_FIN_WAIT2);
sk->sk_shutdown |= SEND_SHUTDOWN; WRITE_ONCE(sk->sk_shutdown, sk->sk_shutdown | SEND_SHUTDOWN);
sk_dst_confirm(sk); sk_dst_confirm(sk);

View File

@ -583,7 +583,8 @@ static int llc_ui_wait_for_disc(struct sock *sk, long timeout)
add_wait_queue(sk_sleep(sk), &wait); add_wait_queue(sk_sleep(sk), &wait);
while (1) { while (1) {
if (sk_wait_event(sk, &timeout, sk->sk_state == TCP_CLOSE, &wait)) if (sk_wait_event(sk, &timeout,
READ_ONCE(sk->sk_state) == TCP_CLOSE, &wait))
break; break;
rc = -ERESTARTSYS; rc = -ERESTARTSYS;
if (signal_pending(current)) if (signal_pending(current))
@ -603,7 +604,8 @@ static bool llc_ui_wait_for_conn(struct sock *sk, long timeout)
add_wait_queue(sk_sleep(sk), &wait); add_wait_queue(sk_sleep(sk), &wait);
while (1) { while (1) {
if (sk_wait_event(sk, &timeout, sk->sk_state != TCP_SYN_SENT, &wait)) if (sk_wait_event(sk, &timeout,
READ_ONCE(sk->sk_state) != TCP_SYN_SENT, &wait))
break; break;
if (signal_pending(current) || !timeout) if (signal_pending(current) || !timeout)
break; break;
@ -622,7 +624,7 @@ static int llc_ui_wait_for_busy_core(struct sock *sk, long timeout)
while (1) { while (1) {
rc = 0; rc = 0;
if (sk_wait_event(sk, &timeout, if (sk_wait_event(sk, &timeout,
(sk->sk_shutdown & RCV_SHUTDOWN) || (READ_ONCE(sk->sk_shutdown) & RCV_SHUTDOWN) ||
(!llc_data_accept_state(llc->state) && (!llc_data_accept_state(llc->state) &&
!llc->remote_busy_flag && !llc->remote_busy_flag &&
!llc->p_flag), &wait)) !llc->p_flag), &wait))

View File

@ -711,9 +711,11 @@ void nf_conntrack_destroy(struct nf_conntrack *nfct)
rcu_read_lock(); rcu_read_lock();
ct_hook = rcu_dereference(nf_ct_hook); ct_hook = rcu_dereference(nf_ct_hook);
BUG_ON(ct_hook == NULL); if (ct_hook)
ct_hook->destroy(nfct); ct_hook->destroy(nfct);
rcu_read_unlock(); rcu_read_unlock();
WARN_ON(!ct_hook);
} }
EXPORT_SYMBOL(nf_conntrack_destroy); EXPORT_SYMBOL(nf_conntrack_destroy);

View File

@ -1218,11 +1218,12 @@ static int __init nf_conntrack_standalone_init(void)
nf_conntrack_htable_size_user = nf_conntrack_htable_size; nf_conntrack_htable_size_user = nf_conntrack_htable_size;
#endif #endif
nf_conntrack_init_end();
ret = register_pernet_subsys(&nf_conntrack_net_ops); ret = register_pernet_subsys(&nf_conntrack_net_ops);
if (ret < 0) if (ret < 0)
goto out_pernet; goto out_pernet;
nf_conntrack_init_end();
return 0; return 0;
out_pernet: out_pernet:

View File

@ -344,6 +344,12 @@ static void nft_netdev_event(unsigned long event, struct net_device *dev,
return; return;
} }
/* UNREGISTER events are also happening on netns exit.
*
* Although nf_tables core releases all tables/chains, only this event
* handler provides guarantee that hook->ops.dev is still accessible,
* so we cannot skip exiting net namespaces.
*/
__nft_release_basechain(ctx); __nft_release_basechain(ctx);
} }
@ -362,9 +368,6 @@ static int nf_tables_netdev_event(struct notifier_block *this,
event != NETDEV_CHANGENAME) event != NETDEV_CHANGENAME)
return NOTIFY_DONE; return NOTIFY_DONE;
if (!check_net(ctx.net))
return NOTIFY_DONE;
nft_net = nft_pernet(ctx.net); nft_net = nft_pernet(ctx.net);
mutex_lock(&nft_net->commit_mutex); mutex_lock(&nft_net->commit_mutex);
list_for_each_entry(table, &nft_net->tables, list) { list_for_each_entry(table, &nft_net->tables, list) {

View File

@ -1990,7 +1990,7 @@ static int netlink_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
skb_free_datagram(sk, skb); skb_free_datagram(sk, skb);
if (nlk->cb_running && if (READ_ONCE(nlk->cb_running) &&
atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2) { atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2) {
ret = netlink_dump(sk); ret = netlink_dump(sk);
if (ret) { if (ret) {
@ -2302,7 +2302,7 @@ static int netlink_dump(struct sock *sk)
if (cb->done) if (cb->done)
cb->done(cb); cb->done(cb);
nlk->cb_running = false; WRITE_ONCE(nlk->cb_running, false);
module = cb->module; module = cb->module;
skb = cb->skb; skb = cb->skb;
mutex_unlock(nlk->cb_mutex); mutex_unlock(nlk->cb_mutex);
@ -2365,7 +2365,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
goto error_put; goto error_put;
} }
nlk->cb_running = true; WRITE_ONCE(nlk->cb_running, true);
nlk->dump_done_errno = INT_MAX; nlk->dump_done_errno = INT_MAX;
mutex_unlock(nlk->cb_mutex); mutex_unlock(nlk->cb_mutex);
@ -2703,7 +2703,7 @@ static int netlink_native_seq_show(struct seq_file *seq, void *v)
nlk->groups ? (u32)nlk->groups[0] : 0, nlk->groups ? (u32)nlk->groups[0] : 0,
sk_rmem_alloc_get(s), sk_rmem_alloc_get(s),
sk_wmem_alloc_get(s), sk_wmem_alloc_get(s),
nlk->cb_running, READ_ONCE(nlk->cb_running),
refcount_read(&s->sk_refcnt), refcount_read(&s->sk_refcnt),
atomic_read(&s->sk_drops), atomic_read(&s->sk_drops),
sock_i_ino(s) sock_i_ino(s)

View File

@ -1934,10 +1934,8 @@ static void packet_parse_headers(struct sk_buff *skb, struct socket *sock)
/* Move network header to the right position for VLAN tagged packets */ /* Move network header to the right position for VLAN tagged packets */
if (likely(skb->dev->type == ARPHRD_ETHER) && if (likely(skb->dev->type == ARPHRD_ETHER) &&
eth_type_vlan(skb->protocol) && eth_type_vlan(skb->protocol) &&
__vlan_get_protocol(skb, skb->protocol, &depth) != 0) { vlan_get_protocol_and_depth(skb, skb->protocol, &depth) != 0)
if (pskb_may_pull(skb, depth)) skb_set_network_header(skb, depth);
skb_set_network_header(skb, depth);
}
skb_probe_transport_header(skb); skb_probe_transport_header(skb);
} }

View File

@ -67,8 +67,8 @@ static void smc_close_stream_wait(struct smc_sock *smc, long timeout)
rc = sk_wait_event(sk, &timeout, rc = sk_wait_event(sk, &timeout,
!smc_tx_prepared_sends(&smc->conn) || !smc_tx_prepared_sends(&smc->conn) ||
sk->sk_err == ECONNABORTED || READ_ONCE(sk->sk_err) == ECONNABORTED ||
sk->sk_err == ECONNRESET || READ_ONCE(sk->sk_err) == ECONNRESET ||
smc->conn.killed, smc->conn.killed,
&wait); &wait);
if (rc) if (rc)

View File

@ -267,9 +267,9 @@ int smc_rx_wait(struct smc_sock *smc, long *timeo,
sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk); sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
add_wait_queue(sk_sleep(sk), &wait); add_wait_queue(sk_sleep(sk), &wait);
rc = sk_wait_event(sk, timeo, rc = sk_wait_event(sk, timeo,
sk->sk_err || READ_ONCE(sk->sk_err) ||
cflags->peer_conn_abort || cflags->peer_conn_abort ||
sk->sk_shutdown & RCV_SHUTDOWN || READ_ONCE(sk->sk_shutdown) & RCV_SHUTDOWN ||
conn->killed || conn->killed ||
fcrit(conn), fcrit(conn),
&wait); &wait);

View File

@ -113,8 +113,8 @@ static int smc_tx_wait(struct smc_sock *smc, int flags)
break; /* at least 1 byte of free & no urgent data */ break; /* at least 1 byte of free & no urgent data */
set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
sk_wait_event(sk, &timeo, sk_wait_event(sk, &timeo,
sk->sk_err || READ_ONCE(sk->sk_err) ||
(sk->sk_shutdown & SEND_SHUTDOWN) || (READ_ONCE(sk->sk_shutdown) & SEND_SHUTDOWN) ||
smc_cdc_rxed_any_close(conn) || smc_cdc_rxed_any_close(conn) ||
(atomic_read(&conn->sndbuf_space) && (atomic_read(&conn->sndbuf_space) &&
!conn->urg_tx_pend), !conn->urg_tx_pend),

View File

@ -2911,7 +2911,7 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
* error to return on the next call or if the * error to return on the next call or if the
* app asks about it using getsockopt(SO_ERROR). * app asks about it using getsockopt(SO_ERROR).
*/ */
sock->sk->sk_err = -err; WRITE_ONCE(sock->sk->sk_err, -err);
} }
out_put: out_put:
fput_light(sock->file, fput_needed); fput_light(sock->file, fput_needed);

View File

@ -314,9 +314,9 @@ static void tsk_rej_rx_queue(struct sock *sk, int error)
tipc_sk_respond(sk, skb, error); tipc_sk_respond(sk, skb, error);
} }
static bool tipc_sk_connected(struct sock *sk) static bool tipc_sk_connected(const struct sock *sk)
{ {
return sk->sk_state == TIPC_ESTABLISHED; return READ_ONCE(sk->sk_state) == TIPC_ESTABLISHED;
} }
/* tipc_sk_type_connectionless - check if the socket is datagram socket /* tipc_sk_type_connectionless - check if the socket is datagram socket

View File

@ -111,7 +111,8 @@ int wait_on_pending_writer(struct sock *sk, long *timeo)
break; break;
} }
if (sk_wait_event(sk, timeo, !sk->sk_write_pending, &wait)) if (sk_wait_event(sk, timeo,
!READ_ONCE(sk->sk_write_pending), &wait))
break; break;
} }
remove_wait_queue(sk_sleep(sk), &wait); remove_wait_queue(sk_sleep(sk), &wait);

View File

@ -603,7 +603,7 @@ static void unix_release_sock(struct sock *sk, int embrion)
/* Clear state */ /* Clear state */
unix_state_lock(sk); unix_state_lock(sk);
sock_orphan(sk); sock_orphan(sk);
sk->sk_shutdown = SHUTDOWN_MASK; WRITE_ONCE(sk->sk_shutdown, SHUTDOWN_MASK);
path = u->path; path = u->path;
u->path.dentry = NULL; u->path.dentry = NULL;
u->path.mnt = NULL; u->path.mnt = NULL;
@ -628,7 +628,7 @@ static void unix_release_sock(struct sock *sk, int embrion)
if (sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET) { if (sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET) {
unix_state_lock(skpair); unix_state_lock(skpair);
/* No more writes */ /* No more writes */
skpair->sk_shutdown = SHUTDOWN_MASK; WRITE_ONCE(skpair->sk_shutdown, SHUTDOWN_MASK);
if (!skb_queue_empty(&sk->sk_receive_queue) || embrion) if (!skb_queue_empty(&sk->sk_receive_queue) || embrion)
WRITE_ONCE(skpair->sk_err, ECONNRESET); WRITE_ONCE(skpair->sk_err, ECONNRESET);
unix_state_unlock(skpair); unix_state_unlock(skpair);
@ -1442,7 +1442,7 @@ static long unix_wait_for_peer(struct sock *other, long timeo)
sched = !sock_flag(other, SOCK_DEAD) && sched = !sock_flag(other, SOCK_DEAD) &&
!(other->sk_shutdown & RCV_SHUTDOWN) && !(other->sk_shutdown & RCV_SHUTDOWN) &&
unix_recvq_full(other); unix_recvq_full_lockless(other);
unix_state_unlock(other); unix_state_unlock(other);
@ -3008,7 +3008,7 @@ static int unix_shutdown(struct socket *sock, int mode)
++mode; ++mode;
unix_state_lock(sk); unix_state_lock(sk);
sk->sk_shutdown |= mode; WRITE_ONCE(sk->sk_shutdown, sk->sk_shutdown | mode);
other = unix_peer(sk); other = unix_peer(sk);
if (other) if (other)
sock_hold(other); sock_hold(other);
@ -3028,7 +3028,7 @@ static int unix_shutdown(struct socket *sock, int mode)
if (mode&SEND_SHUTDOWN) if (mode&SEND_SHUTDOWN)
peer_mode |= RCV_SHUTDOWN; peer_mode |= RCV_SHUTDOWN;
unix_state_lock(other); unix_state_lock(other);
other->sk_shutdown |= peer_mode; WRITE_ONCE(other->sk_shutdown, other->sk_shutdown | peer_mode);
unix_state_unlock(other); unix_state_unlock(other);
other->sk_state_change(other); other->sk_state_change(other);
if (peer_mode == SHUTDOWN_MASK) if (peer_mode == SHUTDOWN_MASK)
@ -3160,16 +3160,18 @@ static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table *wa
{ {
struct sock *sk = sock->sk; struct sock *sk = sock->sk;
__poll_t mask; __poll_t mask;
u8 shutdown;
sock_poll_wait(file, sock, wait); sock_poll_wait(file, sock, wait);
mask = 0; mask = 0;
shutdown = READ_ONCE(sk->sk_shutdown);
/* exceptional events? */ /* exceptional events? */
if (READ_ONCE(sk->sk_err)) if (READ_ONCE(sk->sk_err))
mask |= EPOLLERR; mask |= EPOLLERR;
if (sk->sk_shutdown == SHUTDOWN_MASK) if (shutdown == SHUTDOWN_MASK)
mask |= EPOLLHUP; mask |= EPOLLHUP;
if (sk->sk_shutdown & RCV_SHUTDOWN) if (shutdown & RCV_SHUTDOWN)
mask |= EPOLLRDHUP | EPOLLIN | EPOLLRDNORM; mask |= EPOLLRDHUP | EPOLLIN | EPOLLRDNORM;
/* readable? */ /* readable? */
@ -3203,9 +3205,11 @@ static __poll_t unix_dgram_poll(struct file *file, struct socket *sock,
struct sock *sk = sock->sk, *other; struct sock *sk = sock->sk, *other;
unsigned int writable; unsigned int writable;
__poll_t mask; __poll_t mask;
u8 shutdown;
sock_poll_wait(file, sock, wait); sock_poll_wait(file, sock, wait);
mask = 0; mask = 0;
shutdown = READ_ONCE(sk->sk_shutdown);
/* exceptional events? */ /* exceptional events? */
if (READ_ONCE(sk->sk_err) || if (READ_ONCE(sk->sk_err) ||
@ -3213,9 +3217,9 @@ static __poll_t unix_dgram_poll(struct file *file, struct socket *sock,
mask |= EPOLLERR | mask |= EPOLLERR |
(sock_flag(sk, SOCK_SELECT_ERR_QUEUE) ? EPOLLPRI : 0); (sock_flag(sk, SOCK_SELECT_ERR_QUEUE) ? EPOLLPRI : 0);
if (sk->sk_shutdown & RCV_SHUTDOWN) if (shutdown & RCV_SHUTDOWN)
mask |= EPOLLRDHUP | EPOLLIN | EPOLLRDNORM; mask |= EPOLLRDHUP | EPOLLIN | EPOLLRDNORM;
if (sk->sk_shutdown == SHUTDOWN_MASK) if (shutdown == SHUTDOWN_MASK)
mask |= EPOLLHUP; mask |= EPOLLHUP;
/* readable? */ /* readable? */

View File

@ -6,6 +6,7 @@
ALL_TESTS=" ALL_TESTS="
prio prio
arp_validate arp_validate
num_grat_arp
" "
REQUIRE_MZ=no REQUIRE_MZ=no
@ -255,6 +256,55 @@ arp_validate()
arp_validate_ns "active-backup" arp_validate_ns "active-backup"
} }
garp_test()
{
local param="$1"
local active_slave exp_num real_num i
RET=0
# create bond
bond_reset "${param}"
bond_check_connection
[ $RET -ne 0 ] && log_test "num_grat_arp" "$retmsg"
# Add tc rules to count GARP number
for i in $(seq 0 2); do
tc -n ${g_ns} filter add dev s$i ingress protocol arp pref 1 handle 101 \
flower skip_hw arp_op request arp_sip ${s_ip4} arp_tip ${s_ip4} action pass
done
# Do failover
active_slave=$(cmd_jq "ip -n ${s_ns} -d -j link show bond0" ".[].linkinfo.info_data.active_slave")
ip -n ${s_ns} link set ${active_slave} down
exp_num=$(echo "${param}" | cut -f6 -d ' ')
sleep $((exp_num + 2))
active_slave=$(cmd_jq "ip -n ${s_ns} -d -j link show bond0" ".[].linkinfo.info_data.active_slave")
# check result
real_num=$(tc_rule_handle_stats_get "dev s${active_slave#eth} ingress" 101 ".packets" "-n ${g_ns}")
if [ "${real_num}" -ne "${exp_num}" ]; then
echo "$real_num garp packets sent on active slave ${active_slave}"
RET=1
fi
for i in $(seq 0 2); do
tc -n ${g_ns} filter del dev s$i ingress
done
}
num_grat_arp()
{
local val
for val in 10 20 30 50; do
garp_test "mode active-backup miimon 100 num_grat_arp $val peer_notify_delay 1000"
log_test "num_grat_arp" "active-backup miimon num_grat_arp $val"
done
}
trap cleanup EXIT trap cleanup EXIT
setup_prepare setup_prepare

View File

@ -61,6 +61,8 @@ server_create()
ip -n ${g_ns} link set s${i} up ip -n ${g_ns} link set s${i} up
ip -n ${g_ns} link set s${i} master br0 ip -n ${g_ns} link set s${i} master br0
ip -n ${s_ns} link set eth${i} master bond0 ip -n ${s_ns} link set eth${i} master bond0
tc -n ${g_ns} qdisc add dev s${i} clsact
done done
ip -n ${s_ns} link set bond0 up ip -n ${s_ns} link set bond0 up

View File

@ -791,8 +791,9 @@ tc_rule_handle_stats_get()
local id=$1; shift local id=$1; shift
local handle=$1; shift local handle=$1; shift
local selector=${1:-.packets}; shift local selector=${1:-.packets}; shift
local netns=${1:-""}; shift
tc -j -s filter show $id \ tc $netns -j -s filter show $id \
| jq ".[] | select(.options.handle == $handle) | \ | jq ".[] | select(.options.handle == $handle) | \
.options.actions[0].stats$selector" .options.actions[0].stats$selector"
} }

View File

@ -188,6 +188,26 @@ if [ $? -ne 0 ]; then
exit $ksft_skip exit $ksft_skip
fi fi
ip netns exec $ns2 nft -f - <<EOF
table inet filter {
counter ip4dscp0 { }
counter ip4dscp3 { }
chain input {
type filter hook input priority 0; policy accept;
meta l4proto tcp goto {
ip dscp cs3 counter name ip4dscp3 accept
ip dscp 0 counter name ip4dscp0 accept
}
}
}
EOF
if [ $? -ne 0 ]; then
echo "SKIP: Could not load nft ruleset"
exit $ksft_skip
fi
# test basic connectivity # test basic connectivity
if ! ip netns exec $ns1 ping -c 1 -q 10.0.2.99 > /dev/null; then if ! ip netns exec $ns1 ping -c 1 -q 10.0.2.99 > /dev/null; then
echo "ERROR: $ns1 cannot reach ns2" 1>&2 echo "ERROR: $ns1 cannot reach ns2" 1>&2
@ -255,6 +275,60 @@ check_counters()
fi fi
} }
check_dscp()
{
local what=$1
local ok=1
local counter=$(ip netns exec $ns2 nft reset counter inet filter ip4dscp3 | grep packets)
local pc4=${counter%*bytes*}
local pc4=${pc4#*packets}
local counter=$(ip netns exec $ns2 nft reset counter inet filter ip4dscp0 | grep packets)
local pc4z=${counter%*bytes*}
local pc4z=${pc4z#*packets}
case "$what" in
"dscp_none")
if [ $pc4 -gt 0 ] || [ $pc4z -eq 0 ]; then
echo "FAIL: dscp counters do not match, expected dscp3 == 0, dscp0 > 0, but got $pc4,$pc4z" 1>&2
ret=1
ok=0
fi
;;
"dscp_fwd")
if [ $pc4 -eq 0 ] || [ $pc4z -eq 0 ]; then
echo "FAIL: dscp counters do not match, expected dscp3 and dscp0 > 0 but got $pc4,$pc4z" 1>&2
ret=1
ok=0
fi
;;
"dscp_ingress")
if [ $pc4 -eq 0 ] || [ $pc4z -gt 0 ]; then
echo "FAIL: dscp counters do not match, expected dscp3 > 0, dscp0 == 0 but got $pc4,$pc4z" 1>&2
ret=1
ok=0
fi
;;
"dscp_egress")
if [ $pc4 -eq 0 ] || [ $pc4z -gt 0 ]; then
echo "FAIL: dscp counters do not match, expected dscp3 > 0, dscp0 == 0 but got $pc4,$pc4z" 1>&2
ret=1
ok=0
fi
;;
*)
echo "FAIL: Unknown DSCP check" 1>&2
ret=1
ok=0
esac
if [ $ok -eq 1 ] ;then
echo "PASS: $what: dscp packet counters match"
fi
}
check_transfer() check_transfer()
{ {
in=$1 in=$1
@ -286,17 +360,26 @@ test_tcp_forwarding_ip()
ip netns exec $nsa nc -w 4 "$dstip" "$dstport" < "$nsin" > "$ns1out" & ip netns exec $nsa nc -w 4 "$dstip" "$dstport" < "$nsin" > "$ns1out" &
cpid=$! cpid=$!
sleep 3 sleep 1
if ps -p $lpid > /dev/null;then prev="$(ls -l $ns1out $ns2out)"
sleep 1
while [[ "$prev" != "$(ls -l $ns1out $ns2out)" ]]; do
sleep 1;
prev="$(ls -l $ns1out $ns2out)"
done
if test -d /proc/"$lpid"/; then
kill $lpid kill $lpid
fi fi
if ps -p $cpid > /dev/null;then if test -d /proc/"$cpid"/; then
kill $cpid kill $cpid
fi fi
wait wait $lpid
wait $cpid
if ! check_transfer "$nsin" "$ns2out" "ns1 -> ns2"; then if ! check_transfer "$nsin" "$ns2out" "ns1 -> ns2"; then
lret=1 lret=1
@ -316,6 +399,51 @@ test_tcp_forwarding()
return $? return $?
} }
test_tcp_forwarding_set_dscp()
{
check_dscp "dscp_none"
ip netns exec $nsr1 nft -f - <<EOF
table netdev dscpmangle {
chain setdscp0 {
type filter hook ingress device "veth0" priority 0; policy accept
ip dscp set cs3
}
}
EOF
if [ $? -eq 0 ]; then
test_tcp_forwarding_ip "$1" "$2" 10.0.2.99 12345
check_dscp "dscp_ingress"
ip netns exec $nsr1 nft delete table netdev dscpmangle
else
echo "SKIP: Could not load netdev:ingress for veth0"
fi
ip netns exec $nsr1 nft -f - <<EOF
table netdev dscpmangle {
chain setdscp0 {
type filter hook egress device "veth1" priority 0; policy accept
ip dscp set cs3
}
}
EOF
if [ $? -eq 0 ]; then
test_tcp_forwarding_ip "$1" "$2" 10.0.2.99 12345
check_dscp "dscp_egress"
ip netns exec $nsr1 nft flush table netdev dscpmangle
else
echo "SKIP: Could not load netdev:egress for veth1"
fi
# partial. If flowtable really works, then both dscp-is-0 and dscp-is-cs3
# counters should have seen packets (before and after ft offload kicks in).
ip netns exec $nsr1 nft -a insert rule inet filter forward ip dscp set cs3
test_tcp_forwarding_ip "$1" "$2" 10.0.2.99 12345
check_dscp "dscp_fwd"
}
test_tcp_forwarding_nat() test_tcp_forwarding_nat()
{ {
local lret local lret
@ -385,6 +513,11 @@ table ip nat {
} }
EOF EOF
if ! test_tcp_forwarding_set_dscp $ns1 $ns2 0 ""; then
echo "FAIL: flow offload for ns1/ns2 with dscp update" 1>&2
exit 0
fi
if ! test_tcp_forwarding_nat $ns1 $ns2 0 ""; then if ! test_tcp_forwarding_nat $ns1 $ns2 0 ""; then
echo "FAIL: flow offload for ns1/ns2 with NAT" 1>&2 echo "FAIL: flow offload for ns1/ns2 with NAT" 1>&2
ip netns exec $nsr1 nft list ruleset ip netns exec $nsr1 nft list ruleset
@ -489,8 +622,8 @@ ip -net $nsr1 addr add 10.0.1.1/24 dev veth0
ip -net $nsr1 addr add dead:1::1/64 dev veth0 ip -net $nsr1 addr add dead:1::1/64 dev veth0
ip -net $nsr1 link set up dev veth0 ip -net $nsr1 link set up dev veth0
KEY_SHA="0x"$(ps -xaf | sha1sum | cut -d " " -f 1) KEY_SHA="0x"$(ps -af | sha1sum | cut -d " " -f 1)
KEY_AES="0x"$(ps -xaf | md5sum | cut -d " " -f 1) KEY_AES="0x"$(ps -af | md5sum | cut -d " " -f 1)
SPI1=$RANDOM SPI1=$RANDOM
SPI2=$RANDOM SPI2=$RANDOM