Bonding device forbids slave device of different types under the same
master.
However, it is possible for a bonding master to change type during its
lifetime. This can be either from ARPHRD_ETHER to ARPHRD_INFINIBAND
or the other way arround. The change of type requires device level
multicast address cleanup because device level multicast addresses
depend on the device type.
The patch adds a call to dev_close() before the bonding master changes
type and dev_open() just after that.
In the example below I enslaved an IPoIB device (ib0) under
bond0. Since each bonding master starts as device of type ARPHRD_ETHER
by default, a change of type occurs when ib0 is enslaved.
This is how /proc/net/dev_mcast looks like without the patch
5 bond0 1 0 00ffffffff12601bffff000000000001ff96ca05
5 bond0 1 0 01005e000116
5 bond0 1 0 01005e7ffffd
5 bond0 1 0 01005e000001
5 bond0 1 0 333300000001
6 ib0 1 0 00ffffffff12601bffff000000000001ff96ca05
6 ib0 1 0 333300000001
6 ib0 1 0 01005e000001
6 ib0 1 0 01005e7ffffd
6 ib0 1 0 01005e000116
6 ib0 1 0 00ffffffff12401bffff00000000000000000001
6 ib0 1 0 00ffffffff12601bffff00000000000000000001
and this is how it looks like after the patch.
5 bond0 1 0 00ffffffff12601bffff000000000001ff96ca05
5 bond0 1 0 00ffffffff12601bffff00000000000000000001
5 bond0 1 0 00ffffffff12401bffff0000000000000ffffffd
5 bond0 1 0 00ffffffff12401bffff00000000000000000116
5 bond0 1 0 00ffffffff12401bffff00000000000000000001
6 ib0 1 0 00ffffffff12601bffff000000000001ff96ca05
6 ib0 1 0 00ffffffff12401bffff00000000000000000116
6 ib0 1 0 00ffffffff12401bffff0000000000000ffffffd
6 ib0 2 0 00ffffffff12401bffff00000000000000000001
6 ib0 2 0 00ffffffff12601bffff00000000000000000001
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Need to rework how bonding devices are initialized to make it more
amenable to creating bonding devices via netlink.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove bogus non-portable possibly unaligned way of testing
for zero addres..
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bonding device acts unlike all other Linux network device functions
in that it ignores case of device names. The developer must have come
from windows!
Cleanup the management of names and use standard routines where possible.
Flag places where bonding device still doesn't work right with network
namespaces.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The "expected_refcount" stuff in bonding sysfs module is a mistake.
Sysfs does proper refcounting, and it is okay to remove a bond device
that has some user process holding the file open.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Resolve some of the complaints from checkpatch, and remove "magic emacs format"
comments, and useless MODULE_SUPPORTED_DEVICE(). But should not
change actual code.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not safe to use a network device destructor that is a function in
the module, since it can be called after module is unloaded if sysfs
handle is open.
When eventually using netlink, the device cleanup code needs to be done
via uninit function.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The whole read/write semaphore locking can be removed. It doesn't add any
protection that isn't already done by using the RTNL mutex properly.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid a unnecessary carrier state transistion that happens when device
is registered.
Lockdep works better if initialization is done before registration as well.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bond_create() is always called with same parameters so move the argument
down.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some users still load bond module multiple times to create bonding
devices. This accidentally was broken by a later patch about
the time sysfs was fixed. According to Jay, it was broken
by:
commit b8a9787edd
Author: Jay Vosburgh <fubar@us.ibm.com>
Date: Fri Jun 13 18:12:04 2008 -0700
bonding: Allow setting max_bonds to zero
Note: sysfs and procfs still produce WARN() messages when this is done
so the sysfs method is the recommended API.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One point of contention in high network loads is the dst_release() performed
when a transmited skb is freed. This is because NIC tx completion calls
dev_kree_skb() long after original call to dev_queue_xmit(skb).
CPU cache is cold and the atomic op in dst_release() stalls. On SMP, this is
quite visible if one CPU is 100% handling softirqs for a network device,
since dst_clone() is done by other cpus, involving cache line ping pongs.
It seems right place to release dst is in dev_hard_start_xmit(), for most
devices but ones that are virtual, and some exceptions.
David Miller suggested to define a new device flag, set in alloc_netdev_mq()
(so that most devices set it at init time), and carefuly unset in devices
which dont want a NULL skb->dst in their ndo_start_xmit().
List of devices that must clear this flag is :
- loopback device, because it calls netif_rx() and quoting Patrick :
"ip_route_input() doesn't accept loopback addresses, so loopback packets
already need to have a dst_entry attached."
- appletalk/ipddp.c : needs skb->dst in its xmit function
- And all devices that call again dev_queue_xmit() from their xmit function
(as some classifiers need skb->dst) : bonding, vlan, macvlan, eql, ifb, hdlc_fr
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sysfs files for a network device can not unconditionally take the
rtnl_lock as the bonding sysfs files do. If someone accesses those
sysfs files while the network device is being unregistered with the
rtnl_lock held we will deadlock.
So use trylock and restart_syscall to avoid this problem.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the purposes of bonding is to allow for redundant links, and failover
correctly if the cable is pulled. If all the members of a bonded device have
no carrier present, the bonded device itself needs to report no carrier present
to user space so management tools (like routing daemons) can respond.
Bonding in 802.3ad mode does not work correctly for this because it incorrectly
chooses a link that is down as a possible aggregator.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct net_device trans_start field is a hot spot on SMP and high performance
devices, particularly multi queues ones, because every transmitter dirties
it. Is main use is tx watchdog and bonding alive checks.
But as most devices dont use NETIF_F_LLTX, we have to lock
a netdev_queue before calling their ndo_start_xmit(). So it makes
sense to move trans_start from net_device to netdev_queue. Its update
will occur on a already present (and in exclusive state) cache line, for
free.
We can do this transition smoothly. An old driver continue to
update dev->trans_start, while an updated one updates txq->trans_start.
Further patches could also put tx_bytes/tx_packets counters in
netdev_queue to avoid dirtying dev->stats (vlan device comes to mind)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch fixes issues with dev->dev_addr changing from array to pointer.
Hopefully there are no others.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If module initialisation failed (e.g. because the bonding sysfs entry
cannot be created), kernel panics:
IP: [<ffffffff8024910a>] destroy_workqueue+0x2d/0x146
Call Trace:
[<ffffffff808268c4>] bond_destructor+0x28/0x78
[<ffffffff80b64471>] netdev_run_todo+0x231/0x25a
[<ffffffff80b6dbcd>] rtnl_unlock+0x9/0xb
[<ffffffff81567907>] bonding_init+0x83e/0x84a
Remove the calls to bond_work_cancel_all() and destroy_workqueue();
both are also called/scheduled via bond_free_all().
bond_destroy_sysfs is unecessary because the sysfs entry has
not been created in the error case.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ETH_P_SLOW is already defined in include/linux/if_ether.h.
There's no need to define BOND_ETH_P_LACPDU in drivers/net/bonding/bond_3ad.h
Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove CONFIG_PROC_FS ifdefs from the code by adding void functions.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
drivers/net/bonding/bond_main.c | 30 ++++++++++++++++++++----------
1 files changed, 20 insertions(+), 10 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix locking issue in alb MAC address management; removed
incorrect locking and replaced with correct locking. This bug was
introduced in commit 059fe7a578
("bonding: Convert locks to _bh, rework alb locking for new locking")
Bug reported by Paul Smith <paul@mad-scientist.net>, who also
tested the fix.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the cleanup in bond_create nicer :) Also now the forgotten
free_netdev is called.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bond_slave_info_query() should keep a read lock while accessing slave info,
or risk accessing stale data and corruption.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pointed out by Sean E. Millichamp.
Quote from Documentation/networking/bonding.txt:
"Note that when a bonding interface has no active links, the
driver will immediately reuse the first link that goes up, even if the
updelay parameter has been specified (the updelay is ignored in this
case). If there are slave interfaces waiting for the updelay timeout
to expire, the interface that first went into that state will be
immediately reused. This reduces down time of the network if the
value of updelay has been overestimated, and since this occurs only in
cases with no connectivity, there is no additional penalty for
ignoring the updelay."
This patch actually changes the behaviour in this way.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
drivers/net/bonding/bond_main.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch only changes the order of interfaces to use for checking slave link
status in bond_check_dev_link() to priorize ethtool interface. Should safe some
troubles as ethtool seems to be more supported.
Jirka
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
drivers/net/bonding/bond_main.c | 26 ++++++++++++--------------
1 files changed, 12 insertions(+), 14 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove debug printk I accidently left in as part of commit:
commit 6146b1a4da
Author: Jay Vosburgh <fubar@us.ibm.com>
Date: Tue Nov 4 17:51:15 2008 -0800
bonding: Fix ALB mode to balance traffic on VLANs
Reported by Duncan Gibb <duncan.gibb@siriusit.co.uk>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a zero address hole bug in the bonding arp_ip_target list
that was causing the bond to ignore ARP replies (bugz 13006).
Instead of just setting the array entry to zero, we now
copy any additional entries down one slot, putting the
zero entry at the end. With this change we can now have
all the loops that walk the array stop when they hit a zero
since there will be no addresses after it.
Changes are based in part on code fragment provided in kernel:
bugzilla 13006:
http://bugzilla.kernel.org/show_bug.cgi?id=13006
by Steve Howard <steve@astutenetworks.com>
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy
as correctly noted at bug #12454. Someone can lookup entry with NULL
->owner, thus not pinning enything, and release it later resulting
in module refcount underflow.
We can keep ->owner and supply it at registration time like ->proc_fops
and ->data.
But this leaves ->owner as easy-manipulative field (just one C assignment)
and somebody will forget to unpin previous/pin current module when
switching ->owner. ->proc_fops is declared as "const" which should give
some thoughts.
->read_proc/->write_proc were just fixed to not require ->owner for
protection.
rmmod'ed directories will be empty and return "." and ".." -- no harm.
And directories with tricky enough readdir and lookup shouldn't be modular.
We definitely don't want such modular code.
Removing ->owner will also make PDE smaller.
So, let's nuke it.
Kudos to Jeff Layton for reminding about this, let's say, oversight.
http://bugzilla.kernel.org/show_bug.cgi?id=12454
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
I've hit an issue on my system when I've been using RealTek RTL8139D cards in
bonding interface in mode balancing-alb. When I enslave a card, the current
active slave (bond->curr_active_slave) is not set and the link is therefore
not functional.
----
# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)
Bonding Mode: adaptive load balancing
Primary Slave: None
Currently Active Slave: None
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1f:1f:01:2f:22
----
The thing that gets it right is when I unplug the cable and then I put it back
into the NIC. Then the current active slave is set to eth1 and link is working
just fine. Here is dmesg log with bonding DEBUG messages turned on:
----
ADDRCONF(NETDEV_UP): bond0: link is not ready
event_dev: bond0, event: 1
IFF_MASTER
event_dev: bond0, event: 8
IFF_MASTER
bond_ioctl: master=bond0, cmd=35216
slave_dev=cac5d800:
slave_dev->name=eth1:
eth1: ! NETIF_F_VLAN_CHALLENGED
event_dev: eth1, event: 8
eth1: link up, 100Mbps, full-duplex, lpa 0xC5E1
event_dev: eth1, event: 1
event_dev: eth1, event: 8
IFF_SLAVE
Initial state of slave_dev is BOND_LINK_UP
bonding: bond0: enslaving eth1 as an active interface with an up link.
ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
event_dev: bond0, event: 4
IFF_MASTER
bond0: no IPv6 routers present
<<<<cable unplug>>>>
eth1: link down
event_dev: eth1, event: 4
IFF_SLAVE
bonding: bond0: link status definitely down for interface eth1, disabling it
event_dev: bond0, event: 4
IFF_MASTER
<<<<cable plug>>>>
eth1: link up, 100Mbps, full-duplex, lpa 0xC5E1
event_dev: eth1, event: 4
IFF_SLAVE
bonding: bond0: link status definitely up for interface eth1.
bonding: bond0: making interface eth1 the new active one.
event_dev: eth1, event: 8
IFF_SLAVE
event_dev: eth1, event: 8
IFF_SLAVE
bonding: bond0: first active interface up!
event_dev: bond0, event: 4
IFF_MASTER
----
The current active slave is set by calling bond_select_active_slave() function
from bond_miimon_commit() function when the slave (eth1) link goes to state up.
I also tested this on other machine with Broadcom NetXtreme II BCM5708
1000Base-T NIC and there all works fine. The thing is that this adapter is down
and goes up after few seconds after it is enslaved.
This patch calls bond_select_active_slave() in bond_enslave() function for modes
alb and tlb and makes sure that the current active slave is set up properly even
when the slave state is already up. Tested on both systems, works fine.
Notice: The same problem can maybe also occrur in mode 8023AD but I'm unable to
test that.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch corrects an omission from the following commit:
commit f0c76d6177
Author: Jay Vosburgh <fubar@us.ibm.com>
Date: Wed Jul 2 18:21:58 2008 -0700
bonding: refactor mii monitor
The un-refactored code checked the link speed and duplex of
every slave on every pass; the refactored code did not do so.
The 802.3ad and balance-alb/tlb modes utilize the speed and
duplex information, and require it to be kept up to date. This patch
adds a notifier check to perform the appropriate updating when the slave
device speed changes.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Impact: Rename function scope variable.
Fix this sparse warning:
drivers/net/bonding/bond_main.c:4704:13: warning: symbol 'mode' shadows an earlier one
drivers/net/bonding/bond_main.c:95:13: originally declared here
Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Base versions handle constant folding now.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the correct pointer in debug message.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
802.3ad has its own ethhdr-like structure in the form of an ad_header,
which is at the start of both the LACPDU and marker PDU. Both are
the same from the struct values, both are packed as well.
It's therefore perfectly fine to replace the ad_header by the ethhdr
and to remove its definition.
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Generalize out mac address initializer for the LACPDU multicast
address and use in two places. Remove the now unused
AD_MULTICAST_LACPDU_ADDR.
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Save some text by initializing ports LACPDU from const initializer,
then get rid of ad_initialize_lacpdu().
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As typedefs are considered a bad thing most of the time remove the
typedef around ad_system.
Signed-off-by: Holger Eitzenberger <heitzenberger@astaro.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Turn ports is_individual into a bool. There is no functional change.
Signed-off-by: Holger Eitzenberger <heitzenberger@astaro.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Turn ports is_enabled into a bool. There is no functional change.
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Turn Need-To-Transmit port variable into a bool. There is no
functional change.
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix this sparse warnings:
drivers/net/bonding/bond_main.c:104:20: warning: symbol 'bonding_defaults' was not declared. Should it be static?
drivers/net/bonding/bond_main.c:204:22: warning: symbol 'ad_select_tbl' was not declared. Should it be static?
drivers/net/bonding/bond_sysfs.c:60:21: warning: symbol 'bonding_rwsem' was not declared. Should it be static?
Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: David S. Miller <davem@davemloft.net>