The prandom_u32() function has been a deprecated inline wrapper around
get_random_u32() for several releases now, and compiles down to the
exact same code. Replace the deprecated wrapper with a direct call to
the real function. The same also applies to get_random_int(), which is
just a wrapper around get_random_u32(). This was done as a basic find
and replace.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz> # for ext4
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake
Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Acked-by: Helge Deller <deller@gmx.de> # for parisc
Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Follow the advice of the Documentation/filesystems/sysfs.rst and show()
should only use sysfs_emit() or sysfs_emit_at() when formatting the value
to be returned to user space.
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netdev drivers are expected to call dev_{uc,mc}_sync() in their
ndo_set_rx_mode method and dev_{uc,mc}_unsync() in their ndo_stop method.
This is mentioned in the kerneldoc for those dev_* functions.
The bonding driver calls dev_{uc,mc}_unsync() during ndo_uninit instead of
ndo_stop. This is ineffective because address lists (dev->{uc,mc}) have
already been emptied in unregister_netdevice_many() before ndo_uninit is
called. This mistake can result in addresses being leftover on former bond
slaves after a bond has been deleted; see test_LAG_cleanup() in the last
patch in this series.
Add unsync calls, via bond_hw_addr_flush(), at their expected location,
bond_close().
Add dev_mc_add() call to bond_open() to match the above change.
v3:
* When adding or deleting a slave, only sync/unsync, add/del addresses if
the bond is up. In other cases, it is taken care of at the right time by
ndo_open/ndo_set_rx_mode/ndo_stop.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are already a few definitions of arrays containing
MULTICAST_LACPDU_ADDR and the next patch will add one more use. These all
contain the same constant data so define one common instance for all
bonding code.
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/ethernet/freescale/fec.h
7d650df99d ("net: fec: add pm_qos support on imx6q platform")
40c79ce13b ("net: fec: add stop mode support for imx8 platform")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The unsolicited NA message with all-nodes multicast dest address should
be valid, as this also means the link could reach the target.
Also rename bond_validate_ns() to bond_validate_na().
Reported-by: LiLiang <liali@redhat.com>
Fixes: 5e1eeef69c ("bonding: NS target should accept link local address")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When ns_ip6_target was set, the ipv6_dev_get_saddr() will be called to get
available source address and send IPv6 neighbor solicit message.
If the target is global address, ipv6_dev_get_saddr() will get any
available src address. But if the target is link local address,
ipv6_dev_get_saddr() will only get available address from our interface,
i.e. the corresponding bond interface.
But before bond interface up, all the address is tentative, while
ipv6_dev_get_saddr() will ignore tentative address. This makes we can't
find available link local src address, then bond_ns_send() will not be
called and no NS message was sent. Finally bond interface will keep in
down state.
Fix this by sending NS with unspecified address if there is no available
source address.
Reported-by: LiLiang <liali@redhat.com>
Fixes: 5e1eeef69c ("bonding: NS target should accept link local address")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The value is only ever set once in bond_3ad_initialize and only ever
read otherwise. There seems to be no reason to set the variable via
bond_3ad_initialize when setting the global variable will do. Change
ad_ticks_per_sec to a const to enforce its read-only usage.
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is caused by the global variable ad_ticks_per_sec being zero as
demonstrated by the reproducer script discussed below. This causes
all timer values in __ad_timer_to_ticks to be zero, resulting
in the periodic timer to never fire.
To reproduce:
Run the script in
`tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh` which
puts bonding into a state where it never transmits LACPDUs.
line 44: ip link add fbond type bond mode 4 miimon 200 \
xmit_hash_policy 1 ad_actor_sys_prio 65535 lacp_rate fast
setting bond param: ad_actor_sys_prio
given:
params.ad_actor_system = 0
call stack:
bond_option_ad_actor_sys_prio()
-> bond_3ad_update_ad_actor_settings()
-> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
-> ad.system.sys_mac_addr = bond->dev->dev_addr; because
params.ad_actor_system == 0
results:
ad.system.sys_mac_addr = bond->dev->dev_addr
line 48: ip link set fbond address 52:54:00:3B:7C:A6
setting bond MAC addr
call stack:
bond->dev->dev_addr = new_mac
line 52: ip link set fbond type bond ad_actor_sys_prio 65535
setting bond param: ad_actor_sys_prio
given:
params.ad_actor_system = 0
call stack:
bond_option_ad_actor_sys_prio()
-> bond_3ad_update_ad_actor_settings()
-> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
-> ad.system.sys_mac_addr = bond->dev->dev_addr; because
params.ad_actor_system == 0
results:
ad.system.sys_mac_addr = bond->dev->dev_addr
line 60: ip link set veth1-bond down master fbond
given:
params.ad_actor_system = 0
params.mode = BOND_MODE_8023AD
ad.system.sys_mac_addr == bond->dev->dev_addr
call stack:
bond_enslave
-> bond_3ad_initialize(); because first slave
-> if ad.system.sys_mac_addr != bond->dev->dev_addr
return
results:
Nothing is run in bond_3ad_initialize() because dev_addr equals
sys_mac_addr leaving the global ad_ticks_per_sec zero as it is
never initialized anywhere else.
The if check around the contents of bond_3ad_initialize() is no longer
needed due to commit 5ee14e6d33 ("bonding: 3ad: apply ad_actor settings
changes immediately") which sets ad.system.sys_mac_addr if any one of
the bonding parameters whos set function calls
bond_3ad_update_ad_actor_settings(). This is because if
ad.system.sys_mac_addr is zero it will be set to the current bond mac
address, this causes the if check to never be true.
Fixes: 5ee14e6d33 ("bonding: 3ad: apply ad_actor settings changes immediately")
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit d5410ac7b0 ("net:bonding:support balance-alb interface
with vlan to bridge") introduced a reference count leak by not releasing
the reference acquired by ip_dev_find(). Remedy this by insuring the
reference is released.
Fixes: d5410ac7b0 ("net:bonding:support balance-alb interface with vlan to bridge")
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/26758.1660194413@famine
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, tls_device_down synchronizes with tls_device_resync_rx using
RCU, however, the pointer to netdev is stored using WRITE_ONCE and
loaded using READ_ONCE.
Although such approach is technically correct (rcu_dereference is
essentially a READ_ONCE, and rcu_assign_pointer uses WRITE_ONCE to store
NULL), using special RCU helpers for pointers is more valid, as it
includes additional checks and might change the implementation
transparently to the callers.
Mark the netdev pointer as __rcu and use the correct RCU helpers to
access it. For non-concurrent access pass the right conditions that
guarantee safe access (locks taken, refcount value). Also use the
correct helper in mlx5e, where even READ_ONCE was missing.
The transition to RCU exposes existing issues, fixed by this commit:
1. bond_tls_device_xmit could read netdev twice, and it could become
NULL the second time, after the NULL check passed.
2. Drivers shouldn't stop processing the last packet if tls_device_down
just set netdev to NULL, before tls_dev_del was called. This prevents a
possible packet drop when transitioning to the fallback software mode.
Fixes: 89df6a8104 ("net/bonding: Implement TLS TX device offload")
Fixes: c55dcdd435 ("net/tls: Fix use-after-free after the TLS device goes down and up")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Link: https://lore.kernel.org/r/20220810081602.1435800-1-maximmi@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In my test, balance-alb bonding with two slaves eth0 and eth1,
and then Bond0.150 is created with vlan id attached bond0.
After adding bond0.150 into one linux bridge, I noted that Bond0,
bond0.150 and bridge were assigned to the same MAC as eth0.
Once bond0.150 receives a packet whose dest IP is bridge's
and dest MAC is eth1's, the linux bridge will not match
eth1's MAC entry in FDB, and not handle it as expected.
The patch fix the issue, and diagram as below:
eth1(mac:eth1_mac)--bond0(balance-alb,mac:eth0_mac)--eth0(mac:eth0_mac)
|
bond0.150(mac:eth0_mac)
|
bridge(ip:br_ip, mac:eth0_mac)--other port
Suggested-by: Hu Yadi <huyd12@chinatelecom.cn>
Signed-off-by: Sun Shouxin <sunshouxin@chinatelecom.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bonding driver piggybacks on time stamps kept by the network stack
for the purpose of the netdev TX watchdog, and this is problematic
because it does not work with NETIF_F_LLTX devices.
It is hard to say why the driver looks at dev_trans_start() of the
slave->dev, considering that this is updated even by non-ARP/NS probes
sent by us, and even by traffic not sent by us at all (for example PTP
on physical slave devices). ARP monitoring in active-backup mode appears
to still work even if we track only the last TX time of actual ARP
probes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c
9c5de246c1 ("net: sparx5: mdb add/del handle non-sparx5 devices")
fbb89d02e3 ("net: sparx5: Allow mdb entries to both CPU and ports")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
commit 0622cab034 ("bonding: fix 802.3ad aggregator reselection"),
resolve case, when there is several aggregation groups in the same bond.
bond_3ad_unbind_slave will invalidate (clear) aggregator when
__agg_active_ports return zero. So, ad_clear_agg can be executed even, when
num_of_ports!=0. Than bond_3ad_unbind_slave can be executed again for,
previously cleared aggregator. NOTE: at this time bond_3ad_unbind_slave
will not update slave ports list, because lag_ports==NULL. So, here we
got slave ports, pointing to freed aggregator memory.
Fix with checking actual number of ports in group (as was before
commit 0622cab034 ("bonding: fix 802.3ad aggregator reselection") ),
before ad_clear_agg().
The KASAN logs are as follows:
[ 767.617392] ==================================================================
[ 767.630776] BUG: KASAN: use-after-free in bond_3ad_state_machine_handler+0x13dc/0x1470
[ 767.638764] Read of size 2 at addr ffff00011ba9d430 by task kworker/u8:7/767
[ 767.647361] CPU: 3 PID: 767 Comm: kworker/u8:7 Tainted: G O 5.15.11 #15
[ 767.655329] Hardware name: DNI AmazonGo1 A7040 board (DT)
[ 767.660760] Workqueue: lacp_1 bond_3ad_state_machine_handler
[ 767.666468] Call trace:
[ 767.668930] dump_backtrace+0x0/0x2d0
[ 767.672625] show_stack+0x24/0x30
[ 767.675965] dump_stack_lvl+0x68/0x84
[ 767.679659] print_address_description.constprop.0+0x74/0x2b8
[ 767.685451] kasan_report+0x1f0/0x260
[ 767.689148] __asan_load2+0x94/0xd0
[ 767.692667] bond_3ad_state_machine_handler+0x13dc/0x1470
Fixes: 0622cab034 ("bonding: fix 802.3ad aggregator reselection")
Co-developed-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu>
Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu>
Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/20220629012914.361-1-yevhen.orlov@plvision.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add per port priority support for bonding active slave re-selection during
failover. A higher number means higher priority in selection. The primary
slave still has the highest priority. This option also follows the
primary_reselect rules.
This option could only be configured via netlink.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bonding ARP monitor fails to decrement send_peer_notif, the
number of peer notifications (gratuitous ARP or ND) to be sent. This
results in a continuous series of notifications.
Correct this by decrementing the counter for each notification.
Reported-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Fixes: b0929915e0 ("bonding: Fix RTNL: assertion failed at net/core/rtnetlink.c for ab arp monitor")
Link: https://lore.kernel.org/netdev/b2fd4147-8f50-bebd-963a-1a3e8d1d9715@redhat.com/
Tested-by: Jonathan Toppins <jtoppins@redhat.com>
Reviewed-by: Jonathan Toppins <jtoppins@redhat.com>
Link: https://lore.kernel.org/r/9400.1655407960@famine
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Setting RLB_NULL_INDEX is not needed as this is done in bond_alb_initialize
which is called by bond_open.
Also reduce the number of rtnl_unlock calls by just using the standard
goto cleanup path.
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for reporting errors via extack in both bond_newlink
and bond_changelink.
Instead of having to look in the kernel log for why an option was not
correct just report the error to the user via the extack variable.
What is currently reported today:
ip link add bond0 type bond
ip link set bond0 up
ip link set bond0 type bond mode 4
RTNETLINK answers: Device or resource busy
After this change:
ip link add bond0 type bond
ip link set bond0 up
ip link set bond0 type bond mode 4
Error: unable to set option because the bond is up.
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Guard ns_targets in struct bond_params by CONFIG_IPV6, which could save
256 bytes if IPv6 not configed. Also add this protection for function
bond_is_ip6_target_ok() and bond_get_targets_ip6().
Remove the IS_ENABLED() check for bond_opts[] as this will make
BOND_OPT_NS_TARGETS uninitialized if CONFIG_IPV6 not enabled. Add
a dummy bond_option_ns_ip6_targets_set() for this situation.
Fixes: 4e24be018e ("bonding: add new parameter ns_targets")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Link: https://lore.kernel.org/r/20220531063727.224043-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When adding bond new parameter ns_targets. I forgot to print this
in bond master proc info. After updating, the bond master info will look
like:
ARP IP target/s (n.n.n.n form): 192.168.1.254
NS IPv6 target/s (XX::XX form): 2022::1, 2022::2
Fixes: 4e24be018e ("bonding: add new parameter ns_targets")
Reported-by: Li Liang <liali@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20220530062639.37179-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When setting bond NS target, we use bond_is_ip6_target_ok() to check
if the address valid. The link local address was wrongly rejected in
bond_changelink(), as most time the user just set the ARP/NS target to
gateway, while the IPv6 gateway is always a link local address when user
set up interface via SLAAC.
So remove the link local addr check when setting bond NS target.
Fixes: 129e3c1bab ("bonding: add new option ns_ip6_target")
Reported-by: Li Liang <liali@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When removing the rcu_read_lock in bond_ethtool_get_ts_info() as
discussed [1], I didn't notice it could be called via setsockopt,
which doesn't hold rcu lock, as syzbot pointed:
stack backtrace:
CPU: 0 PID: 3599 Comm: syz-executor317 Not tainted 5.18.0-rc5-syzkaller-01392-g01f4685797a5 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
bond_option_active_slave_get_rcu include/net/bonding.h:353 [inline]
bond_ethtool_get_ts_info+0x32c/0x3a0 drivers/net/bonding/bond_main.c:5595
__ethtool_get_ts_info+0x173/0x240 net/ethtool/common.c:554
ethtool_get_phc_vclocks+0x99/0x110 net/ethtool/common.c:568
sock_timestamping_bind_phc net/core/sock.c:869 [inline]
sock_set_timestamping+0x3a3/0x7e0 net/core/sock.c:916
sock_setsockopt+0x543/0x2ec0 net/core/sock.c:1221
__sys_setsockopt+0x55e/0x6a0 net/socket.c:2223
__do_sys_setsockopt net/socket.c:2238 [inline]
__se_sys_setsockopt net/socket.c:2235 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2235
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f8902c8eb39
Fix it by adding rcu_read_lock and take a ref on the real_dev.
Since dev_hold() and dev_put() can take NULL these days, we can
skip checking if real_dev exist.
[1] https://lore.kernel.org/netdev/27565.1642742439@famine/
Reported-by: syzbot+92beb3d46aab498710fa@syzkaller.appspotmail.com
Fixes: aa6034678e ("bonding: use rcu_dereference_rtnl when get bonding active slave")
Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220519020148.1058344-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Drivers should call the TSO setting helper, GSO is controllable
by user space.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch replaces some sk_ipv6only tests with ipv6_only_sock().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit b5f862180d was introduced to discard lowest hash bit for layer3+4 hashing
but it also removes last bit from non layer3+4 hashing
Below script shows layer2+3 hashing will result in same slave to be used with above commit.
$ cat hash.py
#/usr/bin/python3.6
h_dests=[0xa0, 0xa1]
h_source=0xe3
hproto=0x8
saddr=0x1e7aa8c0
daddr=0x17aa8c0
for h_dest in h_dests:
hash = (h_dest ^ h_source ^ hproto ^ saddr ^ daddr)
hash ^= hash >> 16
hash ^= hash >> 8
print(hash)
print("with last bit removed")
for h_dest in h_dests:
hash = (h_dest ^ h_source ^ hproto ^ saddr ^ daddr)
hash ^= hash >> 16
hash ^= hash >> 8
hash = hash >> 1
print(hash)
Output:
$ python3.6 hash.py
522133332
522133333 <-------------- will result in both slaves being used
with last bit removed
261066666
261066666 <-------------- only single slave used
Signed-off-by: suresh kumar <suresh2514@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before adding yet another possibly contended atomic_long_t,
it is time to add per-cpu storage for existing ones:
dev->tx_dropped, dev->rx_dropped, and dev->rx_nohandler
Because many devices do not have to increment such counters,
allocate the per-cpu storage on demand, so that dev_get_stats()
does not have to spend considerable time folding zero counters.
Note that some drivers have abused these counters which
were supposed to be only used by core networking stack.
v4: should use per_cpu_ptr() in dev_get_stats() (Jakub)
v3: added a READ_ONCE() in netdev_core_stats_alloc() (Paolo)
v2: add a missing include (reported by kernel test robot <lkp@intel.com>)
Change in netdev_core_stats_alloc() (Jakub)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: jeffreyji <jeffreyji@google.com>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20220311051420.2608812-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch add a new bonding option ns_ip6_target, which correspond
to the arp_ip_target. With this we set IPv6 targets and send IPv6 NS
request to determine the health of the link.
For other related options like the validation, we still use
arp_validate, and will change to ns_validate later.
Note: the sysfs configuration support was removed based on
https://lore.kernel.org/netdev/8863.1645071997@famine
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new bonding parameter ns_targets to store IPv6 address.
Add required bond_ns_send/rcv functions first before adding
IPv6 address option setting.
Add two functions bond_send/rcv_validate so we can send/recv
ARP and NS at the same time.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Function bond_handle_vlan() is split from bond_arp_send() for later
IPv6 usage.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In __bond_release_one(), bond_set_carrier() is only called when bond
device has no slave. Therefore, if we remove the up slave from a master
with two slaves and keep the down slave, the master will remain up.
Fix this by moving bond_set_carrier() out of if (!bond_has_slaves(bond))
statement.
Reproducer:
$ insmod bonding.ko mode=0 miimon=100 max_bonds=2
$ ifconfig bond0 up
$ ifenslave bond0 eth0 eth1
$ ifconfig eth0 down
$ ifenslave -d bond0 eth1
$ cat /proc/net/bonding/bond0
Fixes: ff59c4563a ("[PATCH] bonding: support carrier state for master")
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/1645021088-38370-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
cleanup_net() is competing with other rtnl users.
Batching bond_net_exit() factorizes all rtnl acquistions
to a single one, giving chance for cleanup_net()
to progress much faster, holding rtnl a bit longer.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When 803.2ad mode enables a participating port, it should update
the slave-array. I have observed that the member links are participating
and are part of the active aggregator while the traffic is egressing via
only one member link (in a case where two links are participating). Via
kprobes I discovered that slave-arr has only one link added while
the other participating link wasn't part of the slave-arr.
I couldn't see what caused that situation but the simple code-walk
through provided me hints that the enable_port wasn't always associated
with the slave-array update.
Fixes: ee63771474 ("bonding: Simplify the xmit function for modes that use xmit_hash")
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/20220207222901.1795287-1-maheshb@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since ipv6 neighbor solicitation and advertisement messages
isn't handled gracefully in bond6 driver, we can see packet
drop due to inconsistency between mac address in the option
message and source MAC .
Another examples is ipv6 neighbor solicitation and advertisement
messages from VM via tap attached to host bridge, the src mac
might be changed through balance-alb mode, but it is not synced
with Link-layer address in the option message.
The patch implements bond6's tx handle for ipv6 neighbor
solicitation and advertisement messages.
Suggested-by: Hu Yadi <huyd12@chinatelecom.cn>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Sun Shouxin <sunshouxin@chinatelecom.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current release - new code bugs:
- tcp: add a missing sk_defer_free_flush() in tcp_splice_read()
- tcp: add a stub for sk_defer_free_flush(), fix CONFIG_INET=n
- nf_tables: set last expression in register tracking area
- nft_connlimit: fix memleak if nf_ct_netns_get() fails
- mptcp: fix removing ids bitmap setting
- bonding: use rcu_dereference_rtnl when getting active slave
- fix three cases of sleep in atomic context in drivers: lan966x, gve
- handful of build fixes for esoteric drivers after netdev->dev_addr
was made const
Previous releases - regressions:
- revert "ipv6: Honor all IPv6 PIO Valid Lifetime values", it broke
Linux compatibility with USGv6 tests
- procfs: show net device bound packet types
- ipv4: fix ip option filtering for locally generated fragments
- phy: broadcom: hook up soft_reset for BCM54616S
Previous releases - always broken:
- ipv4: raw: lock the socket in raw_bind()
- ipv4: decrease the use of shared IPID generator to decrease the
chance of attackers guessing the values
- procfs: fix cross-netns information leakage in /proc/net/ptype
- ethtool: fix link extended state for big endian
- bridge: vlan: fix single net device option dumping
- ping: fix the sk_bound_dev_if match in ping_lookup
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmHy4mMACgkQMUZtbf5S
IrvOZg/9HyOFAJrCYBlgA3zskHBdqYOGn9M3LCIevBcrCzQeigT+U1uWCINfBn+H
DmsljeYKTicHZ38+HjdNXmzdnMqHtU+iJl4Ep1mcDNywygofW8JcS2Nf0n6Y+hK6
nzyEa23DBt9UAiLmGXUTIoJwEhDRbuL/eH1/ZkkPLG7GtShtEDAKHg+dJBgHbYgJ
0MQs3Q4s6AQ1PYOC0Z0zByhpSrAo2c4X/tr6g2ExNxU0vnydUbjIME0a5clFULr+
ziVeOo4e83FINPaZiYAXEDbMGUC0z+rp1RoGsgRCdTnixi5BclkmEeGRaChYJHTZ
T7tfIC2H0vZHu5/pAXFqwEHiRbminLv9jLkvA1/J67jbnpoNWTLD2jkuIWFlaY/Z
xDm7LnVBB1CdLmXYo2ItSC/8ws9GANpJOq/vFvm+uOYZNKUVctfQ5viA3+hOSULC
6BJHC0m5UminHZPVge9s1XZClarHK4jMMTH9Du2sHLsl3fedNxbgvcVPFdHswLdF
uYiUGMSrIXuQjXw6SNmR4/voJgzikvYhT+jwMn4vTeWoFQFi5eNUch0MPfUImlXG
e3T6WJHrOY3yJFyWQQ9GGLStchD72+iGq2uWLfOIyu9NRKCNBj4Kkm3bUvfqYp5b
d5sP/nl93o3um4WskxB/fDLyhSXWjprgM9mKI45ilPhUC8bWQyo=
=mwR3
-----END PGP SIGNATURE-----
Merge tag 'net-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter and can.
Current release - new code bugs:
- tcp: add a missing sk_defer_free_flush() in tcp_splice_read()
- tcp: add a stub for sk_defer_free_flush(), fix CONFIG_INET=n
- nf_tables: set last expression in register tracking area
- nft_connlimit: fix memleak if nf_ct_netns_get() fails
- mptcp: fix removing ids bitmap setting
- bonding: use rcu_dereference_rtnl when getting active slave
- fix three cases of sleep in atomic context in drivers: lan966x, gve
- handful of build fixes for esoteric drivers after netdev->dev_addr
was made const
Previous releases - regressions:
- revert "ipv6: Honor all IPv6 PIO Valid Lifetime values", it broke
Linux compatibility with USGv6 tests
- procfs: show net device bound packet types
- ipv4: fix ip option filtering for locally generated fragments
- phy: broadcom: hook up soft_reset for BCM54616S
Previous releases - always broken:
- ipv4: raw: lock the socket in raw_bind()
- ipv4: decrease the use of shared IPID generator to decrease the
chance of attackers guessing the values
- procfs: fix cross-netns information leakage in /proc/net/ptype
- ethtool: fix link extended state for big endian
- bridge: vlan: fix single net device option dumping
- ping: fix the sk_bound_dev_if match in ping_lookup"
* tag 'net-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (86 commits)
net: bridge: vlan: fix memory leak in __allowed_ingress
net: socket: rename SKB_DROP_REASON_SOCKET_FILTER
ipv4: remove sparse error in ip_neigh_gw4()
ipv4: avoid using shared IP generator for connected sockets
ipv4: tcp: send zero IPID in SYNACK messages
ipv4: raw: lock the socket in raw_bind()
MAINTAINERS: add missing IPv4/IPv6 header paths
MAINTAINERS: add more files to eth PHY
net: stmmac: dwmac-sun8i: use return val of readl_poll_timeout()
net: bridge: vlan: fix single net device option dumping
net: stmmac: skip only stmmac_ptp_register when resume from suspend
net: stmmac: configure PTP clock source prior to PTP initialization
Revert "ipv6: Honor all IPv6 PIO Valid Lifetime values"
connector/cn_proc: Use task_is_in_init_pid_ns()
pid: Introduce helper task_is_in_init_pid_ns()
gve: Fix GFP flags when allocing pages
net: lan966x: Fix sleep in atomic context when updating MAC table
net: lan966x: Fix sleep in atomic context when injecting frames
ethernet: seeq/ether3: don't write directly to netdev->dev_addr
ethernet: 8390/etherh: don't write directly to netdev->dev_addr
...
bond_option_active_slave_get_rcu() should not be used in rtnl_mutex as it
use rcu_dereference(). Replace to rcu_dereference_rtnl() so we also can use
this function in rtnl protected context.
With this update, we can rmeove the rcu_read_lock/unlock in
bonding .ndo_eth_ioctl and .get_ts_info.
Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Fixes: 94dd016ae5 ("bond: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove PDE_DATA() completely and replace it with pde_data().
[akpm@linux-foundation.org: fix naming clash in drivers/nubus/proc.c]
[akpm@linux-foundation.org: now fix it properly]
Link: https://lkml.kernel.org/r/20211124081956.87711-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wrong hash sends single stream to multiple output interfaces.
The offset calculation was relative to skb->head, fix it to be relative
to skb->data.
Fixes: a815bde56b ("net, bonding: Refactor bond_xmit_hash for use with
xdp_buff")
Reviewed-by: Jussi Maki <joamaki@gmail.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Moshe Tal <moshet@nvidia.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In Linux bonding scenario, one packet is copied to several copies and sent
by all slave device of bond0 in mode 3(broadcast mode). The mode 3 xmit
function bond_xmit_broadcast() only ueses the last slave device's tx result
as the final result. In this case, if the last slave device is down, then
it always return NET_XMIT_DROP, even though the other slave devices xmit
success. It may cause the tx statistics error, and cause the application
(e.g. scp) consider the network is unreachable.
For example, use the following command to configure server A.
echo 3 > /sys/class/net/bond0/bonding/mode
ifconfig bond0 up
ifenslave bond0 eth0 eth1
ifconfig bond0 192.168.1.125
ifconfig eth0 up
ifconfig eth1 down
The slave device eth0 and eth1 are connected to server B(192.168.1.107).
Run the ping 192.168.1.107 -c 3 -i 0.2 command, the following information
is displayed.
PING 192.168.1.107 (192.168.1.107) 56(84) bytes of data.
64 bytes from 192.168.1.107: icmp_seq=1 ttl=64 time=0.077 ms
64 bytes from 192.168.1.107: icmp_seq=2 ttl=64 time=0.056 ms
64 bytes from 192.168.1.107: icmp_seq=3 ttl=64 time=0.051 ms
192.168.1.107 ping statistics
0 packets transmitted, 3 received
Actually, the slave device eth0 of the bond successfully sends three
ICMP packets, but the result shows that 0 packets are transmitted.
Also if we use scp command to get remote files, the command end with the
following printings.
ssh_exchange_identification: read: Connection timed out
So this patch modifies the bond_xmit_broadcast to return NET_XMIT_SUCCESS
if one slave device in the bond sends packets successfully. If all slave
devices send packets fail, the discarded packets stats is increased. The
skb is released when there is no slave device in the bond or the last slave
device is down.
Fixes: ae46f184bc ("bonding: propagate transmit status")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>