Now that last_arp_rx really has the last time we've received any (validated or
not) ARP, we can use it in slave_last_rx() instead of slave->dev->last_rx.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that the options are in place - arp_validate can be set to receive all
the traffic or only arp packets to verify if the slave is up, when the
slave isn't validated.
CC: Rob Landley <rob@landley.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Nikolay Aleksandrov <nikolay@redhat.com>
CC: Ding Tianhong <dingtianhong@huawei.com>
CC: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we can either receive any traffic as a proff of slave being up,
or only *validated* arp traffic (i.e. with src/dst ip checked).
Add an option to be able to specify if we want to receive non-validated arp
traffic only.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we only set bond_arp_rcv() if we're using arp_validate, however
this makes us skip updating last_arp_rx if we're not validating incoming
ARPs - thus, if arp_validate is off, last_arp_rx will never be updated.
Fix this by always setting up recv_probe = bond_arp_rcv, even if we're not
using arp_validate.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we're updating the last_arp_rx only when we've validate the
packet, however afterwards we use it as 'ANY last packet received', but not
only validated ARPs.
Fix this by updating it in case of any packet received. It won't break the
arp_validation=0 because we, anyway, return the correct slave->dev->last_rx in
slave_last_rx().
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently it's disabled because it's sometimes hard, in typical configs, to
make it work - because of the nature how the loadbalance modes work - as
it's hard to deliver valid arp replies to correct slaves by the switch.
However we still can use arp_validation in loadbalance with several other
configs, per example with arp_validate == 2 for backup with one broadcast
domain, without the switch(es) doing any balancing - this way we'd be (a
bit more) sure that the slave is up.
So, enable it to let users decide which one works/suits them best. Also
correct the mode limitation from BOND_OPT_ARP_VALIDATE.
CC: Nikolay Aleksandrov <nikolay@redhat.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We're always called with rcu_read_lock() held (bond_arp_rcv() is only
called from bond_handle_frame(), which is rx_handler and always called
under rcu from __netif_receive_skb_core() ).
The slave active/passive and/or bonding params can change in-flight, however
we don't really care about that - we only modify the last time packet was
received, which is harmless.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
aggregator_identifier is used to assign unique aggregator identifiers
to aggregators of a bond during device enslaving.
aggregator_identifier is currently a global variable that is zeroed in
bond_3ad_initialize().
This sequence will lead to duplicate aggregator identifiers for eth1 and eth3:
create bond0
change bond0 mode to 802.3ad
enslave eth0 to bond0 //eth0 gets agg id 1
enslave eth1 to bond0 //eth1 gets agg id 2
create bond1
change bond1 mode to 802.3ad
enslave eth2 to bond1 //aggregator_identifier is reset to 0
//eth2 gets agg id 1
enslave eth3 to bond0 //eth3 gets agg id 2
Fix this by making aggregator_identifier private to the bond.
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new argument for ndo_select_queue() callback that passes a
fallback handler. This gets invoked through netdev_pick_tx();
fallback handler is currently __netdev_pick_tx() as most drivers
invoke this function within their customized implementation in
case for skbs that don't need any special handling. This fallback
handler can then be replaced on other call-sites with different
queue selection methods (e.g. in packet sockets, pktgen etc).
This also has the nice side-effect that __netdev_pick_tx() is
then only invoked from netdev_pick_tx() and export of that
function to modules can be undone.
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ether_addr_copy is smaller and faster for some architectures.
This relies on a stack frame being at least __aligned(2)
for one use of an Ethernet address on the stack.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing terminating newlines.
Convert uses of pr_info to pr_cont in bond_check_params.
Standardize upper/lower case styles.
Typo fixes, remove unnecessary parentheses and periods.
Alignment neatening.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use more current logging style.
Coalesce formats, realign arguments, drop unnecessary periods.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/bonding/bond_options.c includes rwlock.h directly,
which is a nono, and which also breaks RT kernel build.
Signed-off-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dev_valid_name() will check the buffer length for input name, no need to
check it twice.
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dev_set_mac_address() will check the dev->netdev_ops->ndo_set_mac_address,
so no need to check it in bond_set_mac_address().
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bonding driver take write locks and spin locks that are shared
by the tx path in enslave processing and notification processing,
If the netconsole is in use, the bonding can call printk which puts
us in the netpoll tx path, if the netconsole is attached to the bonding
driver, result in deadlock.
So add protection for these place, by checking the netpoll_block_tx
state, we can defer the sending of the netconsole frames until a later
time using the retransmit feature of netpoll_send_skb that is triggered
on the return code NETDEV_TX_BUSY.
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond enslave processing don't hold bond->lock anymore,
so release an unlocked rw lock will cause warning message,
remove the unwanted read_unlock(&bond->lock).
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fail_over_mac could be set to active or follow in any time for all modes,
so if the fail_over_mac is not none and the current mode is not active-backup,
the bond_set_mac_address() could not change the master and slave's MAC address.
In bond_set_mac_address(), the fail_over_mac should only affect AB mode, so modify
to check the mode in addition to fail_over_mac when setting bond's MAC address.
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to bonding.txt, the fail_over_ma should only affect active-backup mode,
but I found that the fail_over_mac could be set to active or follow in all
modes, this will cause new slave could not be set to bond's MAC address at
enslave processing and restore its own MAC address at removal processing.
The correct way to fix the problem is that we should not add restrictions when
setting options, just need to modify the bond enslave and removal processing
to check the mode in addition to fail_over_mac when setting a slave's MAC during
enslavement. The change active slave processing already only calls the fail_over_mac
function when in active-backup mode.
Thanks for Jay's suggestion.
The patch also modify the pr_warning() to pr_warn().
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 1d3ee88ae0
(bonding: add netlink attributes to slave link dev)
has add rtmsg_ifinfo() in bond_set_active_slave() and
bond_set_backup_slave(), so the two function need to
called in RTNL lock, but bond_loadbalance_arp_mon()
only calling these functions in RCU, warning message
will occurs.
fix this by add a new function bond_slave_state_change(),
which will reset the slave's state after slave link check,
so remove the bond_set_xxx_slave() from the cycle and only
record the slave_state_changed, this will call the new
function to set all slaves to new state in RTNL later.
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we're calling it from under RCU context, however we're using some
functions that require rtnl to be held.
Fix this by restructuring the locking - don't call it under any locks,
aquire rcu_read_lock() if we're sending _only_ (i.e. we have the active
slave present), and use rtnl locking otherwise - if we need to modify
(in)active flags of a slave.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently bond_ab_arp_probe() is always called under rcu_read_lock(),
however to work with curr_active_slave we're still holding the
curr_slave_lock.
To remove that curr_slave_lock - rcu_dereference the bond's
curr_active_slave and use it further - so that we're sure the slave won't
go away, and we don't care if it will change in the meanwhile.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
1) BPF debugger and asm tool by Daniel Borkmann.
2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.
3) Correct reciprocal_divide and update users, from Hannes Frederic
Sowa and Daniel Borkmann.
4) Currently we only have a "set" operation for the hw timestamp socket
ioctl, add a "get" operation to match. From Ben Hutchings.
5) Add better trace events for debugging driver datapath problems, also
from Ben Hutchings.
6) Implement auto corking in TCP, from Eric Dumazet. Basically, if we
have a small send and a previous packet is already in the qdisc or
device queue, defer until TX completion or we get more data.
7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.
8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
Borkmann.
9) Share IP header compression code between Bluetooth and IEEE802154
layers, from Jukka Rissanen.
10) Fix ipv6 router reachability probing, from Jiri Benc.
11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.
12) Support tunneling in GRO layer, from Jerry Chu.
13) Allow bonding to be configured fully using netlink, from Scott
Feldman.
14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
already get the TCI. From Atzm Watanabe.
15) New "Heavy Hitter" qdisc, from Terry Lam.
16) Significantly improve the IPSEC support in pktgen, from Fan Du.
17) Allow ipv4 tunnels to cache routes, just like sockets. From Tom
Herbert.
18) Add Proportional Integral Enhanced packet scheduler, from Vijay
Subramanian.
19) Allow openvswitch to mmap'd netlink, from Thomas Graf.
20) Key TCP metrics blobs also by source address, not just destination
address. From Christoph Paasch.
21) Support 10G in generic phylib. From Andy Fleming.
22) Try to short-circuit GRO flow compares using device provided RX
hash, if provided. From Tom Herbert.
The wireless and netfilter folks have been busy little bees too.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
net/cxgb4: Fix referencing freed adapter
ipv6: reallocate addrconf router for ipv6 address when lo device up
fib_frontend: fix possible NULL pointer dereference
rtnetlink: remove IFLA_BOND_SLAVE definition
rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
qlcnic: update version to 5.3.55
qlcnic: Enhance logic to calculate msix vectors.
qlcnic: Refactor interrupt coalescing code for all adapters.
qlcnic: Update poll controller code path
qlcnic: Interrupt code cleanup
qlcnic: Enhance Tx timeout debugging.
qlcnic: Use bool for rx_mac_learn.
bonding: fix u64 division
rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
sfc: Use the correct maximum TX DMA ring size for SFC9100
Add Shradha Shah as the sfc driver maintainer.
net/vxlan: Share RX skb de-marking and checksum checks with ovs
tulip: cleanup by using ARRAY_SIZE()
ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
net/cxgb4: Don't retrieve stats during recovery
...
After the option conversion downdelay and updelay divide a u64
and on a 32 bit this causes the following errors:
ERROR: "__udivdi3" [drivers/net/bonding/bonding.ko] undefined!
ERROR: "__umoddi3" [drivers/net/bonding/bonding.ko] undefined!
Fix it by using a normal int instead because newval->value is capped
at INT_MAX by the way the option is defined.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like bridge, bonding as netdevice doesn't cross netns boundaries.
Bonding ports and bonding itself live in same netns.
Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow user to identify easily what the attributes are related to. Change
the name of the group to "bonding_slave" to be similar to master which
is named "bonding".
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the new primay is not matching any slave in the bond, the bond should
record it to params, clean the primary slave and select a new active slave.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so slaves would use
the new bonding option API. Also move the option to its own set function
in bond_options.c and fix some style errors.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so lp_interval would use
the new bonding option API.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so resend_igmp would use
the new bonding option API.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so all_slaves_active would use
the new bonding option API.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so queue_id would use
the new bonding option API. Also move it to its own set function in
bond_options.c and fix some style errors.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so active_slave would use
the new bonding option API. Also some trivial/style fixes.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so use_carrier would use
the new bonding option API.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so primary_reselect would use
the new bonding option API.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so primary would use
the new bonding option API.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so miimon would use
the new bonding option API. The "default" definition has been removed as
it was 0.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so num_peer_notif would use
the new bonding option API.
When the auto-sysfs generation is done an alias should be added for
this option as there're currently 2 entries in sysfs for it.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so ad_select would use
the new bonding option API.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so min_links would use
the new bonding option API.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so lacp_rate would use
the new bonding option API. Also some trivial/style error fixes.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so updelay would use
the new bonding option API. Also some trivial style fixes.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so downdelay would use
the new bonding option API. Also some trivial style fixes.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so arp_ip_target would use
the new bonding option API. This option is an exception because of
the way it's currently implemented that's why its netlink code is
a bit different from the other options to keep the functionality as
before and at the same time to have a single set function.
This patch also fixes a few stylistic errors.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so arp_interval would use
the new bonding option API. The "default" definition has been removed as
it was 0.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so fail_over_mac would use
the new bonding option API. Also fixes a trivial copy/paste error in
bond_check_params where the wrong variable was used for the error msg.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so arp_all_targets would use the
new bonding option API.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so arp_validate would use the
new bonding option API. Also fix some trivial/style errors.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so xmit_hash_policy would use the
new bonding option API. Also fix some trivial/style errors.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so packets_per_slave would use the
new bonding option API.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the bond's mode setting use the new option API and
adds support for dependency printing which relies on having an entry for
the mode option in the bond_opts[] array.
Also add the ability to print the mode name when mode dependency fails
and fix some trivial/style errors.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary basic infrastructure to support
centralized and unified option manipulation API for the bonding. The new
structure bond_option will be used to describe each option with its
dependencies on modes which will be checked automatically thus removing a
lot of duplicated code. Also automatic range checking is added for
some options. Currently the option setting function requires RTNL to
be acquired prior to calling it, since many options already rely on RTNL
it seemed like the best choice to protect all against common race
conditions.
In order to add an option the following steps need to be done:
1. Add an entry BOND_OPT_<option> to bond_options.h so it gets a unique id
and a bit corresponding to the id
2. Add a bond_option entry to the bond_opts[] array in bond_options.c which
describes the option, its dependencies and its manipulation function
3. Add code to export the option through sysfs and/or as a module parameter
(the sysfs export will be made automatically in the future)
The options can have different flags set, currently the following are
supported:
BOND_OPTFLAG_NOSLAVES - require that the bond device has no slaves prior
to setting the option
BOND_OPTFLAG_IFDOWN - require that the bond device is down prior to
setting the option
BOND_OPTFLAG_RAWVAL - don't parse the value but return it raw for the
option to parse
There's a new value structure to describe different types of values
which can have the following flags:
BOND_VALFLAG_DEFAULT - marks the default option (permanent string alias
to this option is "default")
BOND_VALFLAG_MIN - the minimum value that this option can have
BOND_VALFLAG_MAX - the maximum value that this option can have
An example would be nice here, so if we have an option which can have
the values "off"(2), "special"(4, default) and supports a range, say
16 - 32, it should be defined as follows:
"off", 2,
"special", 4, BOND_VALFLAG_DEFAULT,
"rangemin", 16, BOND_VALFLAG_MIN,
"rangemax", 32, BOND_VALFLAG_MAX
So we have the valid intervals: [2, 2], [4, 4], [16, 32]
Also the valid strings: "off" = 2, "special" and "default" = 4
"rangemin" = 16, "rangemax" = 32
BOND_VALFLAG_(MIN|MAX) can be used to specify a valid range for an
option, if MIN is omitted then 0 is considered as a minimum. If an
exact match is found in the values[] table it will be returned,
otherwise the range is tried (if available).
The option parameter passing is done by using a special structure called
bond_opt_value which can take either a string or a value to parse. One
of the bond_opt_init(val|str) macros should be used depending on which
one does the user want to parse (string or value). Then a call to
__bond_opt_set should be done under RTNL.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Zawadzki noticed that some divisions by reciprocal_divide()
were not correct [1][2], which he could also show with BPF code
after divisions are transformed into reciprocal_value() for runtime
invariance which can be passed to reciprocal_divide() later on;
reverse in BPF dump ended up with a different, off-by-one K in
some situations.
This has been fixed by Eric Dumazet in commit aee636c480
("bpf: do not use reciprocal divide"). This follow-up patch
improves reciprocal_value() and reciprocal_divide() to work in
all cases by using Granlund and Montgomery method, so that also
future use is safe and without any non-obvious side-effects.
Known problems with the old implementation were that division by 1
always returned 0 and some off-by-ones when the dividend and divisor
where very large. This seemed to not be problematic with its
current users, as far as we can tell. Eric Dumazet checked for
the slab usage, we cannot surely say so in the case of flex_array.
Still, in order to fix that, we propose an extension from the
original implementation from commit 6a2d7a955d resp. [3][4],
by using the algorithm proposed in "Division by Invariant Integers
Using Multiplication" [5], Torbjörn Granlund and Peter L.
Montgomery, that is, pseudocode for q = n/d where q, n, d is in
u32 universe:
1) Initialization:
int l = ceil(log_2 d)
uword m' = floor((1<<32)*((1<<l)-d)/d)+1
int sh_1 = min(l,1)
int sh_2 = max(l-1,0)
2) For q = n/d, all uword:
uword t = (n*m')>>32
q = (t+((n-t)>>sh_1))>>sh_2
The assembler implementation from Agner Fog [6] also helped a lot
while implementing. We have tested the implementation on x86_64,
ppc64, i686, s390x; on x86_64/haswell we're still half the latency
compared to normal divide.
Joint work with Daniel Borkmann.
[1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c
[2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
[3] https://gmplib.org/~tege/division-paper.pdf
[4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html
[5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556
[6] http://www.agner.org/optimize/asmlib.zip
Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: Jesse Gross <jesse@nicira.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull RCU updates from Ingo Molnar:
- add RCU torture scripts/tooling
- static analysis improvements
- update RCU documentation
- miscellaneous fixes
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
rcu: Remove "extern" from function declarations in kernel/rcu/rcu.h
rcu: Remove "extern" from function declarations in include/linux/*rcu*.h
rcu/torture: Dynamically allocate SRCU output buffer to avoid overflow
rcu: Don't activate RCU core on NO_HZ_FULL CPUs
rcu: Warn on allegedly impossible rcu_read_unlock_special() from irq
rcu: Add an RCU_INITIALIZER for global RCU-protected pointers
rcu: Make rcu_assign_pointer's assignment volatile and type-safe
bonding: Use RCU_INIT_POINTER() for better overhead and for sparse
rcu: Add comment on evaluate-once properties of rcu_assign_pointer().
rcu: Provide better diagnostics for blocking in RCU callback functions
rcu: Improve SRCU's grace-period comments
rcu: Fix CONFIG_RCU_FANOUT_EXACT for odd fanout/leaf values
rcu: Fix coccinelle warnings
rcutorture: Stop tracking FSF's postal address
rcutorture: Move checkarg to functions.sh
rcutorture: Flag errors and warnings with color coding
rcutorture: Record results from repeated runs of the same test scenario
rcutorture: Test summary at end of run with less chattiness
rcutorture: Update comment in kvm.sh listing typical RCU trace events
rcutorture: Add tracing-enabled version of TREE08
...
If link is IFF_SLAVE, extend link dev netlink attributes to include
slave attributes with new IFLA_SLAVE nest. Add netlink notification
(RTM_NEWLINK) when slave status changes from backup to active, or
visa-versa.
Adds new ndo_get_slave op to net_device_ops to fill skb with IFLA_SLAVE
attributes. Currently only used by bonding driver, but could be
used by other aggregating devices with slaves.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add sub-directory under /sys/class/net/<interface>/slave with
read-only attributes for slave. Directory only appears when
<interface> is a slave.
$ tree /sys/class/net/eth2/slave/
/sys/class/net/eth2/slave/
├── ad_aggregator_id
├── link_failure_count
├── mii_status
├── perm_hwaddr
├── queue_id
└── state
$ cat /sys/class/net/eth2/slave/*
2
0
up
40:02:10:ef:06:01
0
active
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, if a slave's name change, we just pass it by. However, if the
slave is a current primary_slave, then we end up with using a slave, whose
name != params.primary, for primary_slave. And vice-versa, if we don't have
a primary_slave but have params.primary set - we will not detected a new
primary_slave.
Fix this by catching the NETDEV_CHANGENAME event and setting primary_slave
accordingly. Also, if the primary_slave was changed, issue a reselection of
the active slave, cause the priorities have changed.
Reported-by: Ding Tianhong <dingtianhong@huawei.com>
CC: Ding Tianhong <dingtianhong@huawei.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following call chain indicates that bond_do_ioctl() is protected
under rtnl_lock. If we use __dev_get_by_name() instead of
dev_get_by_name() to find interface handler in it, this would
help us avoid to change reference counter of interface once.
dev_ioctl()
rtnl_lock()
dev_ifsioc()
bond_do_ioctl()
rtnl_unlock()
Additionally we also change the coding style in bond_do_ioctl(),
letting it more readable for us.
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the implementation is meaningless - once again, we take the
slave structure and use it after we've exited RCU critical section.
Fix this by removing the rcu_read_lock() from __get_active_agg(), and
ensuring that all its callers are holding RCU.
Fixes: be79bd048 ("bonding: add RCU for bond_3ad_state_machine_handler()")
CC: dingtianhong@huawei.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the RCU read lock usage is just wrong - it gets the slave struct
under RCU and continues to use it when RCU lock is released.
However, it's still safe to do this cause we didn't need the
rcu_read_lock() initially - all of the __get_first_agg() callers are either
holding RCU read lock or the RTNL lock, so that we can't sync while in it.
Fixes: be79bd048 ("bonding: add RCU for bond_3ad_state_machine_handler()")
CC: dingtianhong@huawei.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, its usage is just plainly wrong. It first gets a slave under
RCU, and, after releasing the RCU lock, continues to use it - whilst it can
be freed.
Fix this by ensuring that bond_3ad_set_carrier() holds RCU till it uses its
slave (or its agg).
Fixes: be79bd048a ("bonding: add RCU for bond_3ad_state_machine_handler()")
CC: dingtianhong@huawei.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
That code has been around for ages without being used.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's a huge mess currently, that is really hard to read. This cleanup
doesn't touch the logic at all, it only breaks easy-to-fix long lines and
updates comment styles.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
will cause several issues:
- NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan
instead of lower device which misses the necessary txq synchronization for
lower device such as txq stopping or frozen required by dev watchdog or
control path.
- dev_hard_start_xmit() was called with NULL txq which bypasses the net device
watchdog.
- dev_hard_start_xmit() does not check txq everywhere which will lead a crash
when tso is disabled for lower device.
Fix this by explicitly introducing a new param for .ndo_select_queue() for just
selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also
extended to accept this parameter and dev_queue_xmit_accel() was used to do l2
forwarding transmission.
With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep
a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of
dev_queue_xmit() to do the transmission.
In the future, it was also required for macvtap l2 forwarding support since it
provides a necessary synchronization method.
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: e1000-devel@lists.sourceforge.net
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
net/ipv6/ip6_tunnel.c
net/ipv6/ip6_vti.c
ipv6 tunnel statistic bug fixes conflicting with consolidation into
generic sw per-cpu net stats.
qlogic conflict between queue counting bug fix and the addition
of multiple MAC address support.
Signed-off-by: David S. Miller <davem@davemloft.net>
It returns 0 in case of success, !0 error otherwise. Fix the improper error
verification.
Fixes: 2c9839c143 ("bonding: add num_grat_arp attribute netlink support")
CC: sfeldma@cumulusnetworks.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add bounds checking for params defined with parm tbl.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing space for IFLA_BOND_ARP_IP_TARGET nest header.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add nested IFLA_BOND_AD_INFO for bonding 802.3ad info.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_AD_SELECT to allow get/set of bonding parameter
ad_select via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_AD_LACP_RATE to allow get/set of bonding parameter
lacp_rate via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
More functions in bonding that can be declared static because
they are only used in one file.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The net_device.dev_addr have more than 2 bytes of additional data after
the mac addr, so it is safe to use the ether_addr_equal_64bits().
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I'm sure the operand slave and bond for the function will not
be NULL, so the check for the bond will not make any sense, so
remove the judgement, and the return value was useless here,
remove the unwanted return value.
The comments for the bond 3ad is too old, cleanup some errors
and warming.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The return value for bond_dev_queue_xmit() will not be used anymore,
so remove the return value.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the skb is xmit by the function bond_slave_override(),
it will have duplicate judgement for slave state, and I think it
will consumes a little performance, maybe it is negligible,
so I simplify the function and remove the unwanted judgement.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond_alb_xmit will check the return value for
bond_dev_queue_xmit() every time, but the bond_dev_queue_xmit()
is always return 0, it is no need to check the value every time,
so remove the unneed judgement for the xmit path.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond_dev_queue_xmit() will always return 0, and as a fast path,
it is inappropriate to check the res value when xmit every package,
so remove the res check and avoid once judgement for xmit.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use possibly more efficient ether_addr_equal_64bits
to instead of memcmp.
Modify the MAC_ADDR_COMPARE to MAC_ADDR_EQUAL, this looks more
appropriate.
The comments for the bond 3ad is too old, cleanup some errors
and warming.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond 3ad and TLB/ALB has the same check path, so combine them.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond driver could set the lp_interval when loading module.
Suggested-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bond_xmit_slave_id is only used in main.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_PACKETS_PER_SLAVE to allow get/set of bonding parameter
packets_per_slave via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_LP_INTERVAL to allow get/set of bonding parameter
lp_interval via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_MIN_LINKS to allow get/set of bonding parameter
min_links via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_ALL_SLAVES_ACTIVE to allow get/set of bonding parameter
all_slaves_active via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_NUM_PEER_NOTIF to allow get/set of bonding parameter
num_grat_arp via netlink. Bonding parameter num_unsol_na is
synonymous with num_grat_arp, so add only one netlink attribute
to represent both bonding parameters.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond_3ad_handle_link_change is called with RTNL only,
and the function will modify the port's information with
no further locking, it will not mutex against bond state
machine and incoming LACPDU which do not hold RTNL, So I
add __get_state_machine_lock to protect the port.
But it is not a critical bug, it exist since day one, and till
now it has never been hit and reported, because changes to
speed is very rare, and will not occur critical problem.
The comments in the function is very old, cleanup it and
add a new pr_debug to debug the port message.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jay Vosburgh said that the bond_3ad_adapter_duplex_changed is
called with RTNL only, and the function will modify the port's
information with no further locking, it will not mutex against
bond state machine and incoming LACPDU which do not hold RTNL,
So I add __get_state_machine_lock to protect the port.
But it is not a critical bug, it exist since day one, and till
now it has never been hit and reported, because changes to
speed is very rare, and will not occur critical problem.
The comments in the function is very old, cleanup it.
Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jay Vosburgh said that the bond_3ad_adapter_speed_changed is
called with RTNL only, and the function will modify the port's
information with no further locking, it will not mutex against
bond state machine and incoming LACPDU which do not hold RTNL,
So I add __get_state_machine_lock to protect the port.
But it is not a critical bug, it exist since day one, and till
now it has never been hit and reported, because changes to
speed is very rare, and will not occur critical problem.
The comment in the function is very old, cleanup it.
Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_RESEND_IGMP to allow get/set of bonding parameter
resend_igmp via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_XMIT_HASH_POLICY to allow get/set of bonding parameter
xmit_hash_policy via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_FAIL_OVER_MAC to allow get/set of bonding parameter
fail_over_mac via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_PRIMARY_SELECT to allow get/set of bonding parameter
primary_select via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_PRIMARY to allow get/set of bonding parameter
primary via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond_resend_igmp_join_requests_delayed() and
bond_resend_igmp_join_requests() should be integrated,
because the bond_resend_igmp_join_requests_delayed() did
nothing except bond_resend_igmp_join_requests().
The bond igmp_retrans could only be changed in bond_change_active_slave
and here, bond_change_active_slave will be called in RTNL and curr_slave_lock,
the bond_resend_igmp_join_requests already hold RTNL, so no need
to free RTNL and hold curr_slave_lock again, it may be a small optimization,
so move the igmp_retrans in RTNL and remove the curr_slave_lock.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond_select_active_slave() will not release and acquire
bond lock, so it is no need to read the bond lock for them,
and the bond_store_primaryxxx() is already in RTNL, so remove the
unwanted lock.
Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond_option_active_slave_set() is always called in RTNL,
the RTNL could protect bond slave list, so remove the unwanted
bond lock.
Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond_3ad_state_machine_handler() use the bond lock to protect
the bond slave list and slave port together, but it is not enough,
the bond slave list was link and unlink in RTNL, not bond lock,
so I add RCU to protect the slave list from leaving.
The bond lock is still used here, because when the slave has been
removed from the list by the time the state machine runs, it appears
to be possible for both function to manupulate the same aggregator->lag_ports
by finding the aggregator via two different ports that are both members of
that aggregator (i.e., port A of the agg is being unbound, and port B
of the agg is runing its state machine).
If I remove the bond lock, there are nothing to mutex changes
to aggregator->lag_ports between bond_3ad_state_machine_handler and
bond_3ad_unbind_slave, So the bond lock is the simplest way to protect
aggregator->lag_ports.
There was a lot of function need RCU protect, I have two choice
to make the function in RCU-safe, (1) create new similar functions
and make the bond slave list in RCU. (2) modify the existed functions
and make them in read-side critical section, because the RCU
read-side critical sections may be nested.
I choose (2) because it is no need to create more similar functions.
The nots in the function is still too old, clean up the nots.
Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond_change_active_slave() and bond_select_active_slave()
do't need bond lock anymore, so remove the unwanted bond lock
for these two functions.
The bond_select_active_slave() will release and acquire
curr_slave_lock, so the curr_slave_lock need to protect
the function.
In bond enslave and bond release, the bond slave list is also
protected by RTNL, so bond lock is no need to exist, remove
the lock and clean the functions.
Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond_activebackup_arp_mon() use the bond lock for read to
protect the slave list, it is no effect, and the RTNL is only
called for bond_ab_arp_commit() and peer notify, for the performance
better, use RCU to replace with the bond lock, to the bond slave
list need to called in RCU, add a new bond_first_slave_rcu()
to get the first slave in RCU protection.
In bond_ab_arp_probe(), the bond->current_arp_slave may changd
if bond release slave, just like:
bond_ab_arp_probe() bond_release()
cpu 0 cpu 1
...
if (bond->current_arp_slave...) ...
... bond->current_arp_slave = NULl
bond->current_arp_slave->dev->name ...
So the current_arp_slave need to dereference in the section.
Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond_first_slave_rcu() will be used to instead of bond_first_slave()
in rcu_read_lock().
According to the Jay Vosburgh's suggestion, the struct netdev_adjacent
should hide from users who wanted to use it directly. so I package a
new function to get the first slave of the bond.
Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond_loadbalance_arp_mon() use the bond lock to protect the
bond slave list, it is no effect, so I could use RTNL or RCU to
replace it, considering the performance impact, the RCU is more
better here, so the bond lock replace with the RCU.
The bond_select_active_slave() need RTNL and curr_slave_lock
together, but there is no RTNL lock here, so add a rtnl_rtylock.
Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond_alb_monitor use bond lock to protect the bond slave list,
it is no effect here, we need to use RTNL or RCU to replace bond lock,
the bond_alb_monitor will called 10 times one second, RTNL may loss
performance here, so I replace bond lock with RCU to protect the
bond slave list, also the RTNL is preserved, the logic of the monitor
did not changed.
Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond_mii_monitor() still use bond lock to protect bond slave list,
it is no effect, I have 2 way to fix the problem, move the RTNL to the
top of the function, or add RCU to protect the bond slave list,
according to the Jay Vosburgh's opinion, 10 times one second is a
truely big performance loss if use RTNL to protect the whole monitor,
so I would take the advice and use RCU to protect the bond slave list.
The bond_has_slave() will not protect by anything, there will no things
happen if the slave list is be changed, unless the bond was free, but
it will not happened before the monitor, the bond will closed before
be freed.
The peers notify for the bond will calling curr_active_slave, so
derefence the slave to make sure we will accessing the same slave
if the curr_active_slave changed, as the rcu dereference need in
read-side critical sector and bond_change_active_slave() will call
it with no RCU hold, so add peer notify in rcu_read_lock which
will be nested in monitor.
Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond slave list was no longer protected by bond lock and only
protected by RTNL or RCU, so anywhere that use bond lock to protect
slave list is meaningless.
remove the release and acquire bond lock for bond_select_active_slave().
The curr_active_slave could only be changed in 3 place:
1. enslave slave.
2. release slave.
3. change_active_slave.
all above place were holding bond lock, RTNL and curr_slave_lock
together, it is tedious and meaningless, obviously bond lock is no
need here, but RTNL or curr_slave_lock is needed, so if you want
to access active slave, you have to choose one lock, RTNL or
curr_slave_lock, if RTNL is exist, no need to add curr_slave_lock,
otherwise curr_slave_lock is better, because of the performance.
there are several place calling bond_select_active_slave() and
bond_change_active_slave(), the next step I will clean these place
and remove the no effect lock.
there are some document changed together when update the function.
Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_ARP_ALL_TARGETS to allow get/set of bonding parameter
arp_all_targets via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_ARP_VALIDATE to allow get/set of bonding parameter
arp_validate via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_ARP_IP_TARGET to allow get/set of bonding parameter
arp_ip_target via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_ARP_INTERVAL to allow get/set of bonding parameter
arp_interval via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_USE_CARRIER to allow get/set of bonding parameter
use_carrier via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_DOWNDELAY to allow get/set of bonding parameter
downdelay via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_UPDELAY to allow get/set of bonding parameter
updelay via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_MIIMON to allow get/set of bonding parameter
miimon via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although rcu_assign_pointer() can be used to assign a constant
NULL pointer, doing so gets you an unnecessary memory barrier and
in some circumstances, sparse warnings. This commit therefore
changes the rcu_assign_pointer() of NULL in __bond_release_one() to
RCU_INIT_POINTER().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Help of this function says: "in_dev: only on this interface, 0=any interface",
but since commit 39a6d06300 ("[NETNS]: Process inet_confirm_addr in the
correct namespace."), the code supposes that it will never be NULL. This
function is never called with in_dev == NULL, but it's exported and may be used
by an external module.
Because this patch restore the ability to call inet_confirm_addr() with in_dev
== NULL, I partially revert the above commit, as suggested by Julian.
CC: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge 'net' into 'net-next' to get the AF_PACKET bug fix that
Daniel's direct transmit changes depend upon.
Signed-off-by: David S. Miller <davem@davemloft.net>
There's an issue when showing the value of packets_per_slave due to
using signed integer. The value may be < 0 and thus not put through
reciprocal_value() before showing. This patch makes it use unsigned
integer when showing it.
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Veaceslav Falico <vfalico@redhat.com>
CC: David S. Miller <davem@davemloft.net>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several files refer to an old address for the Free Software Foundation
in the file header comment. Resolve by replacing the address with
the URL <http://www.gnu.org/licenses/> so that we do not have to keep
updating the header comments anytime the address changes.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Veaceslav Falico <vfalico@redhat.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Paul Mackerras <paulus@samba.org>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When I install the bonding with the wrong arp_ip_target,
just like arp_ip_target=500.500.500.500, the arp_ip_target
was transfored to 245.245.245.244 and stored in the ip
target success, it is uncorrect, so I add checks to avoid
adding wrong address.
The in4_pton() will set wrong ip address to 0.0.0.0 and
return 0, also use the micro IS_IP_TARGET_UNUSABLE_ADDRESS
to simplify the code.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because the ARP monitoring is not support for 802.3ad, but I still
could change the mode to 802.3ad from ab mode while ARP monitoring
is running, it is incorrect.
So add a check for 802.3ad in bonding_store_mode to fix the problem,
and make a new macro BOND_NO_USES_ARP() to simplify the code.
v2: according to the Dan Williams's suggestion, bond mode is the most
important bond option, it should override any of the other sub-options.
So when the mode is changed, the conficting values should be cleared
or reset, otherwise the user has to duplicate more operations to modify
the logic. I disable the arp and enable mii monitoring when the bond mode
is changed to AB, TB and 8023AD if the arp interval is true.
v3: according to the Nik's suggestion, the default value of miimon should need
a name, there is several place to use it, and the bond_store_arp_interval()
could use micro BOND_NO_USES_ARP to make the code more simpify.
Suggested-by: Dan Williams <dcbw@redhat.com>
Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I met a Bug when I add ip target with the wrong ip address:
echo +500.500.500.500 > /sys/class/net/bond0/bonding/arp_ip_target
the wrong ip address will transfor to 245.245.245.244 and add
to the ip target success, it is uncorrect, so I add checks to avoid
adding wrong address.
The in4_pton() will set wrong ip address to 0.0.0.0, it will return by
the next check and will not add to ip target.
v2
According Veaceslav's opinion, simplify the code.
v3
According Veaceslav's opinion, add broadcast check and make a micro
definition to package it.
v4
Solve the problem of the format which David point out.
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes two race conditions between bond_store_updelay/downdelay
and bond_store_miimon which could lead to division by zero as miimon can
be set to 0 while either updelay/downdelay are being set and thus miss the
zero check in the beginning, the zero div happens because updelay/downdelay
are stored as new_value / bond->params.miimon. Use rtnl to synchronize with
miimon setting.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the ARP monitoring is not supported with 802.3ad, and it's
prohibited to use it via the module params.
However we still can set it afterwards via sysfs, cause we only check for
*LB modes there.
To fix this - add a check for 802.3ad mode in bonding_store_arp_interval.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
1) The addition of nftables. No longer will we need protocol aware
firewall filtering modules, it can all live in userspace.
At the core of nftables is a, for lack of a better term, virtual
machine that executes byte codes to inspect packet or metadata
(arriving interface index, etc.) and make verdict decisions.
Besides support for loading packet contents and comparing them, the
interpreter supports lookups in various datastructures as
fundamental operations. For example sets are supports, and
therefore one could create a set of whitelist IP address entries
which have ACCEPT verdicts attached to them, and use the appropriate
byte codes to do such lookups.
Since the interpreted code is composed in userspace, userspace can
do things like optimize things before giving it to the kernel.
Another major improvement is the capability of atomically updating
portions of the ruleset. In the existing netfilter implementation,
one has to update the entire rule set in order to make a change and
this is very expensive.
Userspace tools exist to create nftables rules using existing
netfilter rule sets, but both kernel implementations will need to
co-exist for quite some time as we transition from the old to the
new stuff.
Kudos to Patrick McHardy, Pablo Neira Ayuso, and others who have
worked so hard on this.
2) Daniel Borkmann and Hannes Frederic Sowa made several improvements
to our pseudo-random number generator, mostly used for things like
UDP port randomization and netfitler, amongst other things.
In particular the taus88 generater is updated to taus113, and test
cases are added.
3) Support 64-bit rates in HTB and TBF schedulers, from Eric Dumazet
and Yang Yingliang.
4) Add support for new 577xx tigon3 chips to tg3 driver, from Nithin
Sujir.
5) Fix two fatal flaws in TCP dynamic right sizing, from Eric Dumazet,
Neal Cardwell, and Yuchung Cheng.
6) Allow IP_TOS and IP_TTL to be specified in sendmsg() ancillary
control message data, much like other socket option attributes.
From Francesco Fusco.
7) Allow applications to specify a cap on the rate computed
automatically by the kernel for pacing flows, via a new
SO_MAX_PACING_RATE socket option. From Eric Dumazet.
8) Make the initial autotuned send buffer sizing in TCP more closely
reflect actual needs, from Eric Dumazet.
9) Currently early socket demux only happens for TCP sockets, but we
can do it for connected UDP sockets too. Implementation from Shawn
Bohrer.
10) Refactor inet socket demux with the goal of improving hash demux
performance for listening sockets. With the main goals being able
to use RCU lookups on even request sockets, and eliminating the
listening lock contention. From Eric Dumazet.
11) The bonding layer has many demuxes in it's fast path, and an RCU
conversion was started back in 3.11, several changes here extend the
RCU usage to even more locations. From Ding Tianhong and Wang
Yufen, based upon suggestions by Nikolay Aleksandrov and Veaceslav
Falico.
12) Allow stackability of segmentation offloads to, in particular, allow
segmentation offloading over tunnels. From Eric Dumazet.
13) Significantly improve the handling of secret keys we input into the
various hash functions in the inet hashtables, TCP fast open, as
well as syncookies. From Hannes Frederic Sowa. The key fundamental
operation is "net_get_random_once()" which uses static keys.
Hannes even extended this to ipv4/ipv6 fragmentation handling and
our generic flow dissector.
14) The generic driver layer takes care now to set the driver data to
NULL on device removal, so it's no longer necessary for drivers to
explicitly set it to NULL any more. Many drivers have been cleaned
up in this way, from Jingoo Han.
15) Add a BPF based packet scheduler classifier, from Daniel Borkmann.
16) Improve CRC32 interfaces and generic SKB checksum iterators so that
SCTP's checksumming can more cleanly be handled. Also from Daniel
Borkmann.
17) Add a new PMTU discovery mode, IP_PMTUDISC_INTERFACE, which forces
using the interface MTU value. This helps avoid PMTU attacks,
particularly on DNS servers. From Hannes Frederic Sowa.
18) Use generic XPS for transmit queue steering rather than internal
(re-)implementation in virtio-net. From Jason Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
random32: add test cases for taus113 implementation
random32: upgrade taus88 generator to taus113 from errata paper
random32: move rnd_state to linux/random.h
random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized
random32: add periodic reseeding
random32: fix off-by-one in seeding requirement
PHY: Add RTL8201CP phy_driver to realtek
xtsonic: add missing platform_set_drvdata() in xtsonic_probe()
macmace: add missing platform_set_drvdata() in mace_probe()
ethernet/arc/arc_emac: add missing platform_set_drvdata() in arc_emac_probe()
ipv6: protect for_each_sk_fl_rcu in mem_check with rcu_read_lock_bh
vlan: Implement vlan_dev_get_egress_qos_mask as an inline.
ixgbe: add warning when max_vfs is out of range.
igb: Update link modes display in ethtool
netfilter: push reasm skb through instead of original frag skbs
ip6_output: fragment outgoing reassembled skb properly
MAINTAINERS: mv643xx_eth: take over maintainership from Lennart
net_sched: tbf: support of 64bit rates
ixgbe: deleting dfwd stations out of order can cause null ptr deref
ixgbe: fix build err, num_rx_queues is only available with CONFIG_RPS
...
This patch aims to extend round-robin mode with a new option called
packets_per_slave which can have the following values and effects:
0 - choose a random slave
1 (default) - standard round-robin, 1 packet per slave
>1 - round-robin when >1 packets have been transmitted per slave
The allowed values are between 0 and 65535.
This patch also fixes the comment style in bond_xmit_roundrobin().
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is an extra semi-colon so bond_get_size() doesn't return the
correct value.
Fixes: ec76aa4985 ('bonding: add Netlink support active_slave option')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 4d961a101e, reversing
changes made to a00f6fcc7d.
Revert bond locking changes, they cause regressions and Veaceslav Falico
doesn't like how the commit messages were done at all.
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond slave list may change when the monitor is running, the slave list is no longer
protected by bond->lock, only protected by rtnl lock(), so we have 3 ways to modify it:
1.add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe
to call call_netdevice_notifiers() in write lock.
2.remove unused bond->lock for monitor function, only use the existing rtnl lock().
3.use rcu_read_lock() to protect it, of course, it will transform bond_for_each_slave to
bond_for_each_slave_rcu() and performance is better, but in slow path, it is ignored.
so I remove the bond->lock and move the rtnl lock to protect the whole monitor function.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond slave list may change when the monitor is running, the slave list is no longer
protected by bond->lock, only protected by rtnl lock(), so we have 3 ways to modify it:
1.add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe
to call call_netdevice_notifiers() in write lock.
2.remove unused bond->lock for monitor function, only use the existing rtnl lock().
3.use rcu_read_lock() to protect it, of course, it will transform bond_for_each_slave to
bond_for_each_slave_rcu() and performance is better, but in slow path, it is ignored.
so I remove the bond->lock and move the rtnl lock to protect the whole monitor function.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond slave list may change when the monitor is running, the slave list is no longer
protected by bond->lock, only protected by rtnl lock(), so we have 3 ways to modify it:
1.add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe
to call call_netdevice_notifiers() in write lock.
2.remove unused bond->lock for monitor function, only use the existing rtnl lock().
3.use rcu_read_lock() to protect it, of course, it will transform bond_for_each_slave to
bond_for_each_slave_rcu() and performance is better, but in slow path, it is ignored.
so I remove the bond->lock and add the rtnl lock to protect the whole monitor function.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond slave list may change when the monitor is running, the slave list is no longer
protected by bond->lock, only protected by rtnl lock(), so we have 3 ways to modify it:
1.add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe
to call call_netdevice_notifiers() in write lock.
2.remove unused bond->lock for monitor function, only use the existing rtnl lock().
3.use rcu_read_lock() to protect it, of course, it will transform bond_for_each_slave to
bond_for_each_slave_rcu() and performance is better, but in slow path, it is ignored.
so I remove the bond->lock and move the rtnl lock to protect the whole monitor function.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond slave list may change when the monitor is running, the slave list is no longer
protected by bond->lock, only protected by rtnl lock(), so we have 3 ways to modify it:
1.add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe
to call call_netdevice_notifiers() in write lock.
2.remove unused bond->lock for monitor function, only use the existing rtnl lock().
3.use rcu_read_lock() to protect it, of course, it will transform bond_for_each_slave to
bond_for_each_slave_rcu() and performance is better, but in slow path, it is ignored.
so I remove the bond->lock and move the rtnl lock to protect the whole monitor function.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As Jiri noted, currently we first do all bonding-specific initialization
(specifically - bond_select_active_slave(bond)) before we actually attach
the slave (so that it becomes visible through bond_for_each_slave() and
friends). This might result in bond_select_active_slave() not seeing the
first/new slave and, thus, not actually selecting an active slave.
Fix this by moving all the bond-related init part after we've actually
completely initialized and linked (via bond_master_upper_dev_link()) the
new slave.
Also, remove the bond_(de/a)ttach_slave(), it's useless to have functions
to ++/-- one int.
After this we have all the initialization of the new slave *before*
linking, and all the stuff that needs to be done on bonding *after* it. It
has also a bonus effect - we can remove the locking on the new slave init
completely, and only use it for bond_select_active_slave().
Reported-by: Jiri Pirko <jiri@resnulli.us>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Ding Tianhong@huawei.com
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>