Bringing up a bond interface with all network cables disconnected
does not properly set the interface as DOWN because the call to
netif_carrier_off occurs too early in bond_init. The call needs
to occur after register_netdevice has set dev->reg_state to
NETREG_REGISTERED, so that netif_carrier_off will trigger the
call to linkwatch_fire_event.
Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When packets come in from a device with >= 16 receive queues
headed out a bonding interface, syslog gets filled with this:
kernel: bond0 selects TX queue 16, but real number of TX queues is 16
because queue_mapping is offset by 1. Adjust return value
to account for the offset.
This is a revision of my earlier patch (which did not use the
skb_rx_queue_* helpers - thanks to Ben for the suggestion).
Andy submitted a similar patch which emits a pr_warning on
invalid queue selection, but I believe the log spew is
not useful. We can revisit that question in the future,
but in the interim I believe fixing the core problem is
worthwhile.
Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The idea here is this minimizes the number of places one has to edit
in order to make changes to how flows are defined and used.
Signed-off-by: David S. Miller <davem@davemloft.net>
V2: Move #ifdef CONFIG_PROC_FS into bonding.h, as suggested by David.
bond_main.c is bloating, separate the procfs code out,
move them to bond_procfs.c
Signed-off-by: WANG Cong <amwang@redhat.com>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When there is a ptype handler holding a clone of this skb, whose
destination MAC addresse is overwritten, the owner of this handler may
get a corrupted packet.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These two functions are only used when net poll controller is enabled.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts bonding to use rx_handler. Results in cleaner
__netif_receive_skb() with much less exceptions needed. Also
bond-specific work is moved into bond code.
Did performance test using pktgen and counting incoming packets by
iptables. No regression noted.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
V4: rebase to net-next-2.6
This patch removes the flag IFF_IN_NETPOLL, we don't need it any more since
we have netpoll_tx_running() now.
Signed-off-by: WANG Cong <amwang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
V4: rebase to net-next-2.6
V3: remove an useless #ifdef.
This patch unifies the netpoll code in bonding with netpoll code in bridge,
thanks to Herbert that code is much cleaner now.
Signed-off-by: WANG Cong <amwang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev->master is now tightly connected to bonding driver. This patch makes
this pointer more general and ready to be used by others.
- netdev_set_master() - bond specifics moved to new function
netdev_set_bond_master()
- introduced netif_is_bond_slave() to check if device is a bonding slave
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce printk() levels to KERN_INFO in netdev_fix_features() as this will
be used by ethtool and might spam dmesg unnecessarily.
This converts the function to use netdev_info() instead of plain printk().
As a side effect, bonding and bridge devices will now log dropped features
on every slave device change.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quoting Ben Hutchings: we presumably won't be defining features that
can only be enabled on 64-bit architectures.
Occurences found by `grep -r` on net/, drivers/net, include/
[ Move features and vlan_features next to each other in
struct netdev, as per Eric Dumazet's suggestion -DaveM ]
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
bond_change_active_slave() may be called when a slave is added, even
if the bond has not been brought up yet. It may then attempt to send
packets, and further it may use mcast_work which is uninitialised
before the bond is brought up. Add the necessary checks for
netif_running(bond->dev).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A bond may have a mixture of slave devices with and without hardware
VLAN tag insertion capability. Therefore it always claims this
capability and performs software VLAN tag insertion if the slave does
not.
Since commit 7b9c609037, this has
also been done by dev_hard_start_xmit(). The result is that VLAN-
tagged skbs are now double-tagged when transmitted through slave
devices without hardware VLAN tag insertion!
Remove the now-redundant logic from bond_dev_queue_xmit().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch provides the debugfs facility to the bonding driver.
The "bonding" directory is created in the debugfs root and directories of
each bonding interface (like bond0, bond1...) are created in that.
# mount -t debugfs none /sys/kernel/debug
# ls /sys/kernel/debug/bonding
bond0 bond1
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A while back I made some changes to enable netpoll in the bonding driver. Among
them was a per-cpu flag that indicated we were in a path that held locks which
could cause the netpoll path to block in during tx, and as such the tx path
should queue the frame for later use. This appears to have given rise to a
regression. If one of those paths on which we hold the per-cpu flag yields the
cpu, its possible for us to come back on a different cpu, leading to us clearing
a different flag than we set. This results in odd netpoll drops, and BUG
backtraces appearing in the log, as we check to make sure that we only clear set
bits, and only set clear bits. I had though briefly about changing the
offending paths so that they wouldn't sleep, but looking at my origional work
more closely, it doesn't appear that a per-cpu flag is warranted. We alrady
gate the checking of this flag on IFF_IN_NETPOLL, so we don't hit this in the
normal tx case anyway. And practically speaking, the normal use case for
netpoll is to only have one client anyway, so we're not going to erroneously
queue netpoll frames when its actually safe to do so. As such, lets just
convert that per-cpu flag to an atomic counter. It fixes the rescheduling bugs,
is equivalent from a performance perspective and actually eliminates some code
in the process.
Tested by the reporter and myself, successfully
Reported-by: Liang Zheng <lzheng@redhat.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: David S. Miller <davem@davemloft.net>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Restore the check for an unassigned mac address before adopting the
first slaves as it's own. The change in behavior was introduced by:
commit c20811a79e
Author: Jiri Pirko <jpirko@redhat.com>
bonding: move dev_addr cpy to bond_enslave
Signed-off-by: David Strand <dpstrand@gmail.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of iterating in_dev->mc_list from bonding driver, its better
to call a helper function provided by igmp.c
Details of implementation (locking) are private to igmp code.
ip_mc_rejoin_group(struct ip_mc_list *im) becomes
ip_mc_rejoin_groups(struct in_device *in_dev);
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RCU conversion in IGMP code done in net-next-2.6 raised a race in
__bond_resend_igmp_join_requests().
It iterates in_dev->mc_list without appropriate protection (RTNL, or
read_lock on in_dev->mc_list_lock).
Another cpu might delete an entry while we use it and trigger a fault.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bond_info_seq_start() uses a read_lock(&dev_base_lock) to make sure
device doesn’t disappear. Same goal can be achieved using RCU.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only used in main file.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Flavio Leitner <fleitner@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some recent testing in netpoll with bonding showed this backtrace
------------[ cut here ]------------
kernel BUG at drivers/net/bonding/bonding.h:134!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1d.2/usb7/devnum
CPU 0
Pid: 1876, comm: rmmod Not tainted 2.6.36-rc3+ #10 D26928/
RIP: 0010:[<ffffffffa0514ba4>] [<ffffffffa0514ba4>] bond_uninit+0x6f4/0x7a0
RSP: 0018:ffff88003b1b5d58 EFLAGS: 00010296
RAX: ffff88003b9b6200 RBX: ffff8800373e8e00 RCX: 00000000000f4240
RDX: 00000000ffffffff RSI: 0000000000000286 RDI: 0000000000000286
RBP: ffff88003b1b5dc8 R08: 0000000000000000 R09: 00000001af7de920
R10: 0000000000000000 R11: ffff880002495e98 R12: ffff880037922700
R13: ffff880038c31000 R14: ffff880037922730 R15: 0000000000000286
FS: 00007f90e6d72700(0000) GS:ffff880002400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000346f0d9ad0 CR3: 000000003b263000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rmmod (pid: 1876, threadinfo ffff88003b1b4000, task ffff88003b36aa80)
Stack:
00000000ffffffff ffff88003b1b5d7a ffff8800379221e8 ffff880037922000
<0> ffff88003b1b5dc8 ffffffff813eb5fb ffff88003b1b5da8 0000000031b177a3
<0> ffff88003b1b5da8 ffff880037922000 ffff88003b1b5e48 ffff88003b1b5e48
Call Trace:
[<ffffffff813eb5fb>] ? rtmsg_ifinfo+0xcb/0xf0
[<ffffffff813daad8>] rollback_registered_many+0x168/0x280
[<ffffffff813dac09>] unregister_netdevice_many+0x19/0x80
[<ffffffff813e97b3>] __rtnl_kill_links+0x63/0x90
[<ffffffff813e980b>] __rtnl_link_unregister+0x2b/0x60
[<ffffffff813e9bde>] rtnl_link_unregister+0x1e/0x30
[<ffffffffa052124b>] bonding_exit+0x37/0x51 [bonding]
[<ffffffff81098b2e>] sys_delete_module+0x19e/0x270
[<ffffffff810bb2b2>] ? audit_syscall_entry+0x252/0x280
[<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b
RIP [<ffffffffa0514ba4>] bond_uninit+0x6f4/0x7a0 [bonding]
RSP <ffff88003b1b5d58>
---[ end trace 1395ad691cea24d1 ]---
It occurs because of my recent netpoll blocking patches, which I added to avoid
recursive deadlock in the bonding driver. It relies on some per cpu bits, but
the shutdown path forces some rescheduling as we cancel workqueues for the
driver and wait for some device refcounts. If after the forced reschedule, we
wind up on a different cpu we trigger the bughalt in unblock_netpoll_tx.
The fix is to remove the netpoll block/unblock calls from bond_release_all.
This is safe to do because bond_uninit, which is called via ndo_uninit in
rollback_registered_many, doesn't occur until we send a NETDEV_UNREGISTER event,
which triggers netconsole to remove us as a netpoll client, so we are guaranteed
not to recurse into our own tx path here.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the inclusion of previous fixup patches, netpoll over bonding apears to
work reliably with failover conditions. This reverts Gospos previous commit
c22d7ac844, and allows access again to the netpoll
functionality in the bonding driver.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The monitoring paths in the bonding driver take write locks that are shared by
the tx path. If netconsole is in use, these paths can call printk which puts us
in the netpoll tx path, which, if netconsole is attached to the bonding driver,
result in deadlock (the xmit_lock guards are useless in netpoll_send_skb, as the
monitor paths in the bonding driver don't claim the xmit_lock, nor should they).
The solution is to use a per cpu flag internal to the driver to indicate when a
cpu is holding the lock in a path that might recusrse into the tx path for the
driver via netconsole. By checking this flag on transmit, we can defer the
sending of the netconsole frames until a later time using the retransmit feature
of netpoll_send_skb that is triggered on the return code NETDEV_TX_BUSY. I've
tested this and am able to transmit via netconsole while causing failover
conditions on the bond slave links.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bonding driver currently modifies the netpoll structure in its xmit path
while sending frames from netpoll. This is racy, as other cpus can access the
netpoll structure in parallel. Since the bonding driver points np->dev to a
slave device, other cpus can inadvertently attempt to send data directly to
slave devices, leading to improper locking with the bonding master, lost frames,
and deadlocks. This patch fixes that up.
This patch also removes the real_dev pointer from the netpoll structure as that
data is really only used by bonding in the poll_controller, and we can emulate
its behavior by check each slave for IS_UP.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Effect:
Slave Interface: eth5
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: XX:XX:XX:XX:XX:XX
Slave queue ID: 0
Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an interface was enslaved when it was down, bonding thinks
it has speed -1 even after it goes up. This leads into selecting
a wrong active interface in active/backup mode on mixed 10G/1G or
1G/100M environment.
before:
bonding: bond0: link status definitely up for interface eth5, 100 Mbps full duplex.
bonding: bond0: link status definitely up for interface eth0, 100 Mbps full duplex.
after:
bonding: bond0: link status definitely up for interface eth5, 10000 Mbps full duplex.
bonding: bond0: link status definitely up for interface eth0, 1000 Mbps full duplex.
Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
before:
bonding: bond0: link status definitely up for interface eth5
bonding: bond0: link status definitely up for interface eth0
after:
bonding: bond0: link status definitely up for interface eth5, 100 Mbps full duplex.
bonding: bond0: link status definitely up for interface eth0, 100 Mbps full duplex.
Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow sysadmins to configure the number of multicast
membership report sent on a link failure event.
Signed-off-by: Flavio Leitner <fleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During a failover, the IGMP membership is sent to update
the switch restoring the traffic, but it misses groups added
to VLAN devices running on top of bonding devices.
This patch changes it to iterate over all VLAN devices
on top of it sending IGMP memberships too.
Signed-off-by: Flavio Leitner <fleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a WARN_ON failure in bond_masters sysfs file
Got a report of this warning recently
bonding: bond0 is being created...
------------[ cut here ]------------
WARNING: at fs/proc/generic.c:590 proc_register+0x14d/0x185()
Hardware name: ProLiant BL465c G1
proc_dir_entry 'bonding/bond0' already registered
Modules linked in: bonding ipv6 tg3 bnx2 shpchp amd64_edac_mod edac_core
ipmi_si
ipmi_msghandler serio_raw i2c_piix4 k8temp edac_mce_amd hpwdt microcode hpsa
cc
iss radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded:
scsi_wai
t_scan]
Pid: 935, comm: ifup-eth Not tainted 2.6.33.5-124.fc13.x86_64 #1
Call Trace:
[<ffffffff8104b54c>] warn_slowpath_common+0x77/0x8f
[<ffffffff8104b5b1>] warn_slowpath_fmt+0x3c/0x3e
[<ffffffff8114bf0b>] proc_register+0x14d/0x185
[<ffffffff8114c20c>] proc_create_data+0x87/0xa1
[<ffffffffa0211e9b>] bond_create_proc_entry+0x55/0x95 [bonding]
[<ffffffffa0215e5d>] bond_init+0x95/0xd0 [bonding]
[<ffffffff8138cd97>] register_netdevice+0xdd/0x29e
[<ffffffffa021240b>] bond_create+0x8e/0xb8 [bonding]
[<ffffffffa021c4be>] bonding_store_bonds+0xb3/0x1c1 [bonding]
[<ffffffff812aec85>] class_attr_store+0x27/0x29
[<ffffffff8115423d>] sysfs_write_file+0x10f/0x14b
[<ffffffff81101acf>] vfs_write+0xa9/0x106
[<ffffffff81101be2>] sys_write+0x45/0x69
[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
---[ end trace a677c3f7f8b16b1e ]---
bonding: Bond creation failed.
It happens because a user space writer to bond_master can try to
register an already existing bond interface name. Fix it by teaching
bond_create to check for the existance of devices with that name first
in cases where a non-NULL name parameter has been passed in
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
gro can be enabled by default on bonding devices.
Actual support depends on the lower devices.
One can still use ethtool to switch off GRO if needed.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The time_before_eq()/time_after_eq() functions operate on unsigned
long and only work if the difference between the two compared values
is smaller than half the range of unsigned long (31 bits on i386).
Some of the variables (slave->jiffies, dev->trans_start, dev->last_rx)
used by bonding store a copy of jiffies and may not be updated for a
long time. With HZ=1000, time_before_eq()/time_after_eq() will start
giving bad results after ~25 days.
jiffies will never be before slave->jiffies, dev->trans_start,
dev->last_rx by more than possibly a couple ticks caused by preemption
of this code. This allows us to detect/prevent these overflows by
replacing time_before_eq()/time_after_eq() with time_in_range().
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
When copying VLAN information to or removing from a slave
during slave addition or removal, the bonding code currently holds
the bond->lock for write to prevent concurrent modification of the
vlan_list / vlgrp.
This is unnecessary, as all of these operations occur under
RTNL. Holding the bond->lock also caused might_sleep issues for
some drivers' ndo_vlan_* functions. This patch removes the extra
locking.
Problem reported by Michael Chan <mchan@broadcom.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit ad1afb0039
("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
it is now regular practice for a VLAN "add vid" for VLAN 0 to
arrive prior to any VLAN registration or creation of a vlan_group.
This patch updates the bonding code that tests for the presence
of VLANs configured above bonding. The new logic tests for bond->vlgrp
to determine if a registration has occured, instead of testing that
bonding's internal vlan_list is empty.
The old code would panic when vlan_list was not empty, but
vlgrp was still NULL (because only an "add vid" for VLAN 0 had occured).
Bonding still adds VLAN 0 to its internal list so that 802.1p
frames are handled correctly on transmit when non-VLAN accelerated
slaves are members of the bond. The test against bond->vlan_list
remains in bond_dev_queue_xmit for this reason.
Modification to the bond->vlgrp now occurs under lock (in
addition to RTNL), because not all inspections of it occur under RTNL.
Additionally, because 8021q will never issue a "kill vid" for
VLAN 0, there is now logic in bond_uninit to release any remaining
entries from vlan_list.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Cc: Pedro Garcia <pedro.netdev@dondevamos.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/bonding/bond_main.c:179:12: warning: ‘disable_netpoll’
defined but not used
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a small possibility that a reader gets incorrect values on 32
bit arches. SNMP applications could catch incorrect counters when a
32bit high part is changed by another stats consumer/provider.
One way to solve this is to add a rtnl_link_stats64 param to all
ndo_get_stats64() methods, and also add such a parameter to
dev_get_stats().
Rule is that we are not allowed to use dev->stats64 as a temporary
storage for 64bit stats, but a caller provided area (usually on stack)
Old drivers (only providing get_stats() method) need no changes.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support for netpoll over bonded interfaces was added here:
commit f6dc31a85c
Author: WANG Cong <amwang@redhat.com>
Date: Thu May 6 00:48:51 2010 -0700
bonding: make bonding support netpoll
but it is bad enough that we should probably just disable netpoll over
bonding until some of the locking logic in the bonding driver is changed
or converted completely to RCU. Simple actions like changing the active
slave in active-backup mode will hang the box if a high enough printk
debugging level is enabled.
Keeping the old code around will be good for anyone that wants to work
on it (and for after the RCU conversion), so I propose this small patch
rather than ripping it all out.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use struct rtnl_link_stats64 as the statistics structure.
On 32-bit architectures, insert 32 bits of padding after/before each
field of struct net_device_stats to make its layout compatible with
struct rtnl_link_stats64. Add an anonymous union in net_device; move
stats into the union and add struct rtnl_link_stats64 stats64.
Add net_device_ops::ndo_get_stats64, implementations of which will
return a pointer to struct rtnl_link_stats64. Drivers that implement
this operation must not update the structure asynchronously.
Change dev_get_stats() to call ndo_get_stats64 if available, and to
return a pointer to struct rtnl_link_stats64. Change callers of
dev_get_stats() accordingly.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
remove useless union keyword in rtable, rt6_info and dn_route.
Since there is only one member in a union, the union keyword isn't useful.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
v2: changed bonding module version, modified to apply on top of changes
from previous patch in series, and updated documentation to elaborate on
multiqueue awareness that now exists in bonding driver.
This patch give the user the ability to control the output slave for
round-robin and active-backup bonding. Similar functionality was
discussed in the past, but Jay Vosburgh indicated he would rather see a
feature like this added to existing modes rather than creating a
completely new mode. Jay's thoughts as well as Neil's input surrounding
some of the issues with the first implementation pushed us toward a
design that relied on the queue_mapping rather than skb marks.
Round-robin and active-backup modes were chosen as the first users of
this slave selection as they seemed like the most logical choices when
considering a multi-switch environment.
Round-robin mode works without any modification, but active-backup does
require inclusion of the first patch in this series and setting
the 'all_slaves_active' flag. This will allow reception of unicast traffic on
any of the backup interfaces.
This was tested with IPv4-based filters as well as VLAN-based filters
with good results.
More information as well as a configuration example is available in the
patch to Documentation/networking/bonding.txt.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
v2: changed parameter name from 'keep_all' to 'all_slaves_active' and
skipped setting slaves to inactive rather than creating a new flag at
Jay's suggestion.
In an effort to suppress duplicate frames on certain bonding modes
(specifically the modes that do not require additional configuration on
the switch or switches connected to the host), code was added in the
generic receive patch in 2.6.16. The current behavior works quite well
for most users, but there are some times it would be nice to restore old
functionality and allow all frames to make their way up the stack.
This patch adds support for a new module option and sysfs file called
'all_slaves_active' that will restore pre-2.6.16 functionality if the
user desires. The default value is '0' and retains existing behavior,
but the user can set it to '1' and allow all frames up if desired.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is stored but never restored. So remove this as it is useless.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the code that copies slave's mac address in case that's the first slave into
bond_enslave. Ifenslave app does this also but that's not a problem. This is
something that should be done in bond_enslave, and it shound not matter from
where is it called.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
V1->V2: corrected res/ret use
For some reason, MTU handling (storing, and restoring) is taking place in
bond_sysfs. The correct place for this code is in bond_enslave, bond_release.
So move it there.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on Andy's work, but I modified a lot.
Similar to the patch for bridge, this patch does:
1) implement the 2 methods to support netpoll for bonding;
2) modify netpoll during forwarding packets via bonding;
3) disable netpoll support of bonding when a netpoll-unabled device
is added to bonding;
4) enable netpoll support when all underlying devices support netpoll.
Cc: Andy Gospodarek <gospo@redhat.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Converts the list and the core manipulating with it to be the same as uc_list.
+uses two functions for adding/removing mc address (normal and "global"
variant) instead of a function parameter.
+removes dev_mcast.c completely.
+exposes netdev_hw_addr_list_* macros along with __hw_addr_* functions for
manipulation with lists on a sandbox (used in bonding and 80211 drivers)
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
+little renaming of unicast functions to be smooth with multicast ones
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bond_uninit() is invoked with rtnl_lock held, when it does destroy_workqueue()
which will potentially flush all works in this workqueue, if we hold rtnl_lock
again in the work function, it will deadlock.
So move destroy_workqueue() to destructor where rtnl_lock is not held any more,
suggested by Eric.
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Jiri Pirko <jpirko@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit a2fd940f (bonding: fix broken multicast with round-robin mode)
added a problem on litle endian machines.
drivers/net/bonding/bond_main.c:4159: warning: comparison is always
false due to limited range of data type
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Round-robin (mode 0) does nothing to ensure that any multicast traffic
originally destined for the host will continue to arrive at the host when
the link that sent the IGMP join or membership report goes down. One of
the benefits of absolute round-robin transmit.
Keeping track of subscribed multicast groups for each slave did not seem
like a good use of resources, so I decided to simply send on the
curr_active slave of the bond (typically the first enslaved device that
is up). This makes failover management simple as IGMP membership
reports only need to be sent when the curr_active_slave changes. I
tested this patch and it appears to work as expected.
Originally reported by Lon Hohberger <lhh@redhat.com>.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
CC: Lon Hohberger <lhh@redhat.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the type change, addresses in unicast and multicast lists wouldn't make
sense, not to mention possible different lenghts. So flush both lists here.
Note "dev_addr_discard" will be very soon replaced by "dev_mc_flush" (once
mc_list conversion will be done).
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the possibility to refuse the bonding type change for
other subsystems (such as for example bridge, vlan, etc.)
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since generally there could be more netdevices changing type other
than bonding, making this event type name "bonding-unrelated"
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the register_netdevice() call fails, the newly allocated device is
not freed.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need to maintain stats in the bonding structure.
Use the instance of net_device_stats in netdevice.
Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The convention for API functions in kernel is to return errno value;
bond_open would return -1 if alb setup failed. The only reason that
could happen is if kmalloc() failed.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__net_init/__net_exit are apparently not going away, so use them
to full extent.
In some cases __net_init was removed, because it was called from
__net_exit code.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows a bond device to specify an arp_ip_target as a host that is
not on the same vlan as the base bond device and still use arp
validation. A configuration like this, now works:
BONDING_OPTS="mode=active-backup arp_interval=1000 arp_ip_target=10.0.100.1 arp_validate=3"
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 qlen 1000
link/ether 00:13:21:be:33:e9 brd ff:ff:ff:ff:ff:ff
3: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 qlen 1000
link/ether 00:13:21:be:33:e9 brd ff:ff:ff:ff:ff:ff
8: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue
link/ether 00:13:21:be:33:e9 brd ff:ff:ff:ff:ff:ff
inet6 fe80::213:21ff:febe:33e9/64 scope link
valid_lft forever preferred_lft forever
9: bond0.100@bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue
link/ether 00:13:21:be:33:e9 brd ff:ff:ff:ff:ff:ff
inet 10.0.100.2/24 brd 10.0.100.255 scope global bond0.100
inet6 fe80::213:21ff:febe:33e9/64 scope link
valid_lft forever preferred_lft forever
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth1
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 10.0.100.1
Slave Interface: eth1
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:40:05:30:ff:30
Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:13:21:be:33:e9
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
Remove DRV_NAME from pr_<level>s
Consolidate long format strings
Remove some extra tab indents
Remove some unnecessary ()s from pr_<level>s arguments
Align pr_<level> arguments
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only files where David Miller is the primary git-signer.
wireless, wimax, ixgbe, etc are not modified.
Compile tested x86 allyesconfig only
Not all files compiled (not x86 compatible)
Added a few > 80 column lines, which I ignored.
Existing checkpatch complaints ignored.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Take advantage of the new pernet automatic storage management,
and stop using compatibility network namespace functions.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Don't call rtnl_link_unregister if rtnl_link_register fails
- Set .priv_size so we aren't stomping on uninitialized memory
when we use netdev_priv, on bond devices created with
ip link add type bond.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This implements a basic set of rtnl link ops and takes advantage of
the fact that rtnl_link_unregister kills all of the surviving
devices to all us to kill bond_free_all. A module alias
is added so ip link add can pull in the bonding module.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Manually inline the code from bond_deinit to bond_uninit. bond_uninit
is the only caller and it is short.
Move the call of bond_release_all from the netdev notifier into
bond_uninit. The call site is effectively the same and performing
the call explicitly allows all the paths for destroying a
bonding device to behave the same way.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stop calling dev_get_by_name to see if the bond device already
exists. register_netdevice already does that.
Stop calling bond_deinit if register_netdevice fails as bond_uninit
is guaranteed to be called if bond_init succeeds.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch delegates the work of creating the sysfs groups
to the netdev layer and ultimately to the device layer. This
closes races between uevents.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In mii monitor mode, bond_check_dev_link() calls the the ioctl
handler of slave devices. It stores the ndo_do_ioctl function
pointer to a static (!) ioctl variable and later uses it to call the
handler with the IOCTL macro.
If another thread executes bond_check_dev_link() at the same time
(even with a different bond, which none of the locks prevent), a
race condition occurs. If the two racing slaves have different
drivers, this may result in one driver's ioctl handler being
called with a pointer to a net_device controlled with a different
driver, resulting in unpredictable breakage.
Unless I am overlooking something, the "static" must be a
copy'n'paste error (?).
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that the bonding device is no longer used in determining the device to
which to send packets, it can be dropped from the argument list of the various
xmit_hash_policy calls.
Signed-off-by: Jasper Spaans <spaans@fox-it.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Modify bonding hash transmit policies to use the psource MAC address of
the packet instead of the MAC address configured for the bonding device.
The old sitation conflicts with the documentation.
Signed-off-by: Jasper Spaans <spaans@fox-it.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function bond_create_proc_entry is currently of type int.
Two versions of this function exist:
The one in the ifdef CONFIG_PROC_FS branch always return 0.
The one in the else branch (which is empty) return nothing.
When CONFIG_PROC_FS is undef, this cause the following warning:
drivers/net/bonding/bond_main.c: In function `bond_create_proc_entry':
drivers/net/bonding/bond_main.c:3393: warning: control reaches end of
non-void function
No caller of this function use the returned value.
So change the returned type from int to void and remove the
useless return 0; .
Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Reported-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable old_active is first set to bond->curr_active_slave.
Then, it is unconditionally set to new_active, without being used in between.
The first assignment, having no side effect, is useless.
Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When parsing module parameters, bond_check_params() erroneously use
'xor_mode' as the name of a module parameter in an error message.
The right name for this parameter is 'xmit_hash_policy'.
Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some cases there is not desirable to switch back to primary interface when
it's link recovers and rather stay with currently active one. We need to avoid
packetloss as much as we can in some cases. This is solved by introducing
primary_reselect option. Note that enslaved primary slave is set as current
active no matter what.
Patch modified by Jay Vosburgh as follows: fixed bug in action
after change of option setting via sysfs, revised the documentation
update, and bumped the bonding version number.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When I was implementing primary_passive option (formely named primary_lazy) I've
run into troubles with ab_arp. This is the only mode which is not using
bond_select_active_slave() function to select active slave and instead it
selects it itself. This seems to be not the right behaviour and it would be
better to do it in bond_select_active_slave() for all cases. This patch makes
this happen. Please review.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes commit e36b9d16c6. The approach
there is to call dev_close()/dev_open() whenever the device type is changed in
order to remap the device IP multicast addresses to HW multicast addresses.
This approach suffers from 2 drawbacks:
*. It assumes tha the device is UP when calling dev_close(), or otherwise
dev_close() has no affect. It is worth to mention that initscripts (Redhat)
and sysconfig (Suse) doesn't act the same in this matter.
*. dev_close() has other side affects, like deleting entries from the routing
table, which might be unnecessary.
The fix here is to directly remap the IP multicast addresses to HW multicast
addresses for a bonding device that changes its type, and nothing else.
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These are all drivers that don't touch real hardware.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bonding: Have bond_check_dev_link examine netif_running
Some network devices do not call netif_carrier_off when they
are set administratively down. Have the bonding link check function
also inspect the netif_running state. Ignore netif_running if the
bond_check_dev_link function is called with "reporting" set, as in that
case it's inspecting the capabilities of the non-netif_carrier device
driver.
Signed-off-by: Petri Gynther <pgynther@google.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
max_bonds is of type int and cannot be greater than INT_MAX.
Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bonding can use compare_ether_addr() in bond_release.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Propogate the vlan_features of the slave devices to the bonding
master device, using the same logic as for regular features.
Tested by Or Gerlitz <ogerlitz@voltaire.com>, who also removed
the debug logic from the original test patch.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I did not introduce new lines over 80 chars. I even eliminated some of
them.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>