OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Eric Dumazet	b0ce3508b2	bonding: allow TSO being set on bonding master In some situations, we need to disable TSO on bonding slaves. bonding device automatically unset TSO in bond_fix_features(), and performance is not good because : 1) We consume more cpu cycles. 2) GSO segmentation has some bugs leading to out of order TCP packets if this segmentation is done before virtual device. This particular problem will be addressed in a separate patch. This patch allows TSO being set/unset on the bonding master, so that GSO segmentation is done after bonding layer. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Michał Mirosław <mirqus@gmail.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Maciej Żenczykowski <maze@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-16 15:02:01 -07:00
David S. Miller	58717686cf	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c drivers/net/ethernet/emulex/benet/be.h include/net/tcp.h net/mac802154/mac802154.h Most conflicts were minor overlapping stuff. The be2net driver brought in some fixes that added __vlan_put_tag calls, which in net-next take an additional argument. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-30 03:55:20 -04:00
nikolay@redhat.com	c6cdcf6d82	bonding: fix locking in enslave failure path In commit `3c5913b53f` ("bonding: primary_slave & curr_active_slave are not cleaned on enslave failure") I didn't account for the use of curr_active_slave without curr_slave_lock and since there are such users, we should hold bond->lock for writing while setting it to NULL (in the NULL case we don't need the curr_slave_lock). Keeping the bond lock as to avoid the extra release/acquire cycle. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-25 04:03:21 -04:00
David S. Miller	6e0895c2ea	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/emulex/benet/be_main.c drivers/net/ethernet/intel/igb/igb_main.c drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c include/net/scm.h net/batman-adv/routing.c net/ipv4/tcp_input.c The e{uid,gid} --> {uid,gid} credentials fix conflicted with the cleanup in net-next to now pass cred structs around. The be2net driver had a bug fix in 'net' that overlapped with the VLAN interface changes by Patrick McHardy in net-next. An IGB conflict existed because in 'net' the build_skb() support was reverted, and in 'net-next' there was a comment style fix within that code. Several batman-adv conflicts were resolved by making sure that all calls to batadv_is_my_mac() are changed to have a new bat_priv first argument. Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO rewrite in 'net-next', mostly overlapping changes. Thanks to Stephen Rothwell and Antonio Quartulli for help with several of these merge resolutions. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-22 20:32:51 -04:00
nikolay@redhat.com	d632ce989c	bonding: in bond_mc_swap() bond's mc addr list is walked without lock Use netif_addr_lock_bh() to acquire the appropriate lock before walking. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 17:48:19 -04:00
nikolay@redhat.com	fc7a72ac86	bonding: disable netpoll on enslave failure slave_disable_netpoll() is not called upon enslave failure which would lead to a memory leak. Call slave_disable_netpoll() after err_detach as that's the first error path after enabling netpoll on that slave. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 17:48:19 -04:00
nikolay@redhat.com	3c5913b53f	bonding: primary_slave & curr_active_slave are not cleaned on enslave failure On enslave failure primary_slave can point to new_slave which is to be freed, and the same applies to curr_active_slave. So check if this is the case and clean up properly after err_detach because that's the first error code path after they're set. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 17:48:19 -04:00
nikolay@redhat.com	a506e7b479	bonding: vlans don't get deleted on enslave failure The main problem is with vid refcount which only gets bumped up. Delete the vlans after err_detach as that's the first error path after the vlans are added. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 17:48:18 -04:00
nikolay@redhat.com	25e40305d4	bonding: mc addresses don't get deleted on enslave failure Add bond_mc_list_flush() after err_detach as that's the first error path after the addresses are added. The main issue is the mc addresses' refcount which only gets bumped up. v2: update log message and don't move code unnecessarily Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 17:48:18 -04:00
Andy Gospodarek	bb5b052f75	bond: add support to read speed and duplex via ethtool This patch adds support for the get_settings ethtool op to the bonding driver. This was motivated by users who wanted to get the speed of the bond and compare that against throughput to understand utilization. The behavior before this patch was added was problematic when computing line utilization after trying to get link-speed and throughput via SNMP. Output from ethtool looks like this for a round-robin bond: Settings for bond0: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Speed: 11000Mb/s Duplex: Full Port: Other PHYAD: 0 Transceiver: internal Auto-negotiation: off MDI-X: Unknown Link detected: yes I tested this and verified it works as expected. A test was also done on a version backported to an older kernel and it worked well there. v2: Switch to using ethtool_cmd_speed_set to set speed, added check to SLAVE_IS_OK for each slave in bond, dropped mode-specific calculations as they were not needed, and set port type to 'Other.' v3: Fix useless assignment and checkpatch warning. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 16:39:50 -04:00
Patrick McHardy	1fd9b1fc31	net: vlan: prepare for 802.1ad support Make the encapsulation protocol value a property of VLAN devices and change the device lookup functions to take the protocol value into account. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:45:27 -04:00
Patrick McHardy	80d5c3689b	net: vlan: prepare for 802.1ad VLAN filtering offload Change the rx_{add,kill}_vid callbacks to take a protocol argument in preparation of 802.1ad support. The protocol argument used so far is always htons(ETH_P_8021Q). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:45:27 -04:00
Patrick McHardy	f646968f8f	net: vlan: rename NETIF_F_HW_VLAN_* feature flags to NETIF_F_HW_VLAN_CTAG_* Rename the hardware VLAN acceleration features to include "CTAG" to indicate that they only support CTAGs. Follow up patches will introduce 802.1ad server provider tagging (STAGs) and require the distinction for hardware not supporting acclerating both. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:45:26 -04:00
Eric Dumazet	4394542ca4	bonding: fix l23 and l34 load balancing in forwarding path Since commit `6b923cb718` (bonding: support for IPv6 transmit hashing) bonding doesn't properly hash traffic in forwarding setups. Vitaly V. Bursov diagnosed that skb_network_header_len() returned 0 in this case. More generally, the transport header might not be in the skb head. Use pskb_may_pull() & skb_header_pointer() to get it right, and use proto_ports_offset() in bond_xmit_hash_policy_l34() to get support for more protocols than TCP and UDP. Reported-by: Vitaly V. Bursov <vitalyb@telenet.dn.ua> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: John Eaglesham <linux@8192.net> Tested-by: Vitaly V. Bursov <vitalyb@telenet.dn.ua> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-18 15:07:29 -04:00
nikolay@redhat.com	b6a5a7b9a5	bonding: IFF_BONDING is not stripped on enslave failure While enslaving a new device and after IFF_BONDING flag is set, in case of failure it is not stripped from the device's priv_flags while cleaning up, which could lead to other problems. Cleaning at err_close because the flag is set after dev_open(). v2: no change Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-11 16:01:47 -04:00
nikolay@redhat.com	6101391d4a	bonding: fix netdev event NULL pointer dereference In commit `471cb5a33d` ("bonding: remove usage of dev->master") a bug was introduced which causes a NULL pointer dereference. If a bond device is in mode 6 (ALB) and a slave is added it will dereference a NULL pointer in bond_slave_netdev_event(). This is because in bond_enslave we have bond_alb_init_slave() which changes the MAC address of the slave and causes a NETDEV_CHANGEADDR. Then we have in bond_slave_netdev_event(): struct slave slave = bond_slave_get_rtnl(slave_dev); struct bonding bond = slave->bond; bond_slave_get_rtnl() dereferences slave_dev->rx_handler_data which at that time is NULL since netdev_rx_handler_register() is called later. This is fixed by checking if slave is NULL before dereferencing it. v2: Comment style changed. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-11 16:01:47 -04:00
nikolay@redhat.com	69b0216ac2	bonding: fix bonding_masters race condition in bond unloading While the bonding module is unloading, it is considered that after rtnl_link_unregister all bond devices are destroyed but since no synchronization mechanism exists, a new bond device can be created via bonding_masters before unregister_pernet_subsys which would lead to multiple problems (e.g. NULL pointer dereference, wrong RIP, list corruption). This patch fixes the issue by removing any bond devices left in the netns after bonding_masters is removed from sysfs. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-08 16:45:09 -04:00
nikolay@redhat.com	ffcdedb667	Revert "bonding: remove sysfs before removing devices" This reverts commit `4de79c737b`. This patch introduces a new bug which causes access to freed memory. In bond_uninit: list_del(&bond->bond_list); bond_list is linked in bond_net's dev_list which is freed by unregister_pernet_subsys. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-08 16:45:09 -04:00
David S. Miller	d978a6361a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/nfc/microread/mei.c net/netfilter/nfnetlink_queue_core.c Pull in 'net' to get Eric Biederman's AF_UNIX fix, upon which some cleanups are going to go on-top. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-07 18:37:01 -04:00
Veaceslav Falico	4de79c737b	bonding: remove sysfs before removing devices We have a race condition if we try to rmmod bonding and simultaneously add a bond master through sysfs. In bonding_exit() we first remove the devices (through rtnl_link_unregister() ) and only after that we remove the sysfs. If we manage to add a device through sysfs after that the devices were removed - we'll end up with that device/sysfs structure and with the module unloaded. Fix this by first removing the sysfs and only after that calling rtnl_link_unregister(). Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-05 00:46:13 -04:00
David S. Miller	d662483264	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull net into net-next to get the synchronize_net() bug fix in bonding. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-03 01:31:54 -04:00
Veaceslav Falico	fcd99434fb	bonding: get netdev_rx_handler_unregister out of locks Now that netdev_rx_handler_unregister contains synchronize_net(), we need to call it outside of bond->lock, cause it might sleep. Also, remove the already unneded synchronize_net(). Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-02 12:05:28 -04:00
Veaceslav Falico	ad999eee66	bonding: cleanup unneeded rcu_read_lock() bond_resend_igmp_join_requests_delayed() calls _resend_igmp_join_requests() under rcu_read_lock(), while it gets its own rcu_read_lock() for the whole function. Remove the lock from the _delayed function. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-26 12:49:40 -04:00
Veaceslav Falico	876254ae27	bonding: don't call update_speed_duplex() under spinlocks bond_update_speed_duplex() might sleep while calling underlying slave's routines. Move it out of atomic context in bond_enslave() and remove it from bond_miimon_commit() - it was introduced by commit `546add79`, however when the slave interfaces go up/change state it's their responsibility to fire NETDEV_UP/NETDEV_CHANGE events so that bonding can properly update their speed. I've tested it on all combinations of ifup/ifdown, autoneg/speed/duplex changes, remote-controlled and local, on (not) MII-based cards. All changes are visible. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-13 04:53:17 -04:00
Veaceslav Falico	80028ea1c0	bonding: fire NETDEV_RELEASE event only on 0 slaves Currently, if we set up netconsole over bonding and release a slave, netconsole will stop logging on the whole bonding device. Change the behavior to stop the netconsole only when the last slave is released. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-07 16:15:18 -05:00
Jiri Pirko	6c8c4e4c24	bond: check if slave count is 0 in case when deciding to take slave's mac in bond_enslave(), check slave_cnt before actually using slave address. introduced by: commit `409cc1f8a4` (bond: have random dev address by default instead of zeroes) Reported-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-26 17:30:38 -05:00
Doug Goldstein	b3f92b63c4	bonding: set sysfs device_type to 'bond' Sets the sysfs device_type to 'bond' for udev. This allows udev rules to be created for bond devices. This is similar to how other network devices set their device_type. Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-19 00:51:09 -05:00
nikolay@redhat.com	0896341a44	bonding: fix bond_release_all inconsistencies This patch fixes the following inconsistencies in bond_release_all: - IFF_BONDING flag is not stripped from slaves - MTU is not restored - no netdev notifiers are sent Instead of trying to keep bond_release and bond_release_all in sync I think we can re-use bond_release as the environment for calling it is correct (RTNL is held). I have been running tests for the past week and they came out successful. The only way for bond_release to fail is for the slave to be attached in a different bond or to not be a slave but that cannot happen as RTNL is held and no slave manipulations can be achieved. V2: As suggested bond_release is renamed to __bond_release_one with a new parameter "all" introduced so to avoid calling unnecessary code while destroying a bond, and a wrapper for it called bond_release is created because of ndo_del_link. bond_release_all() is removed. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-19 00:51:09 -05:00
Neil Horman	2cde6acd49	netpoll: Fix __netpoll_rcu_free so that it can hold the rtnl lock __netpoll_rcu_free is used to free netpoll structures when the rtnl_lock is already held. The mechanism is used to asynchronously call __netpoll_cleanup outside of the holding of the rtnl_lock, so as to avoid deadlock. Unfortunately, __netpoll_cleanup modifies pointers (dev->np), which means the rtnl_lock must be held while calling it. Further, it cannot be held, because rcu callbacks may be issued in softirq contexts, which cannot sleep. Fix this by converting the rcu callback to a work queue that is guaranteed to get scheduled in process context, so that we can hold the rtnl properly while calling __netpoll_cleanup Tested successfully by myself. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> CC: Cong Wang <amwang@redhat.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-11 19:19:33 -05:00
Gao feng	387ff91184	netns: bond: allow unprivileged users to control bond device reduce the permission check of bond device's ioctl. allow the userns root to control the bond device. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-04 13:12:16 -05:00
Jiri Pirko	409cc1f8a4	bond: have random dev address by default instead of zeroes Makes more sense to have randomly generated address by default than to have all zeroes. It also allows user to for example put the bond into bridge without need to have any slaves in it. Also note that this changes only behaviour of bonds with no slaves. Once the first slave device is enslaved, its address will be used (no change here). Also, fix dev_assign_type values on the way. Reported-by: Pavel Šimerda <psimerda@redhat.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-30 15:34:00 -05:00
Jiri Pirko	7826d43f2d	ethtool: fix drvinfo strings set in drivers Use strlcpy where possible to ensure the string is \0 terminated. Use always sizeof(string) instead of 32, ETHTOOL_BUSINFO_LEN and custom defines. Use snprintf instead of sprint. Remove unnecessary inits of ->fw_version Remove unnecessary inits of drvinfo struct. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-06 21:06:31 -08:00
Jiri Pirko	471cb5a33d	bonding: remove usage of dev->master Benefit from new upper dev list and free bonding from dev->master usage. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-04 13:31:50 -08:00
Konstantin Khlebnikov	cfb6f99dd9	bonding: do not cancel works in bond_uninit() Bonding initializes these works in bond_open() and cancels in bond_close(), thus in bond_uninit() they are already canceled but may be unitialized yet. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Nikolay Aleksandrov <nikolay@redhat.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-14 13:14:07 -05:00
Linus Torvalds	6be35c700f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking changes from David Miller: 1) Allow to dump, monitor, and change the bridge multicast database using netlink. From Cong Wang. 2) RFC 5961 TCP blind data injection attack mitigation, from Eric Dumazet. 3) Networking user namespace support from Eric W. Biederman. 4) tuntap/virtio-net multiqueue support by Jason Wang. 5) Support for checksum offload of encapsulated packets (basically, tunneled traffic can still be checksummed by HW). From Joseph Gasparakis. 6) Allow BPF filter access to VLAN tags, from Eric Dumazet and Daniel Borkmann. 7) Bridge port parameters over netlink and BPDU blocking support from Stephen Hemminger. 8) Improve data access patterns during inet socket demux by rearranging socket layout, from Eric Dumazet. 9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and Jon Maloy. 10) Update TCP socket hash sizing to be more in line with current day realities. The existing heurstics were choosen a decade ago. From Eric Dumazet. 11) Fix races, queue bloat, and excessive wakeups in ATM and associated drivers, from Krzysztof Mazur and David Woodhouse. 12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions in VXLAN driver, from David Stevens. 13) Add "oops_only" mode to netconsole, from Amerigo Wang. 14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also allow DCB netlink to work on namespaces other than the initial namespace. From John Fastabend. 15) Support PTP in the Tigon3 driver, from Matt Carlson. 16) tun/vhost zero copy fixes and improvements, plus turn it on by default, from Michael S. Tsirkin. 17) Support per-association statistics in SCTP, from Michele Baldessari. And many, many, driver updates, cleanups, and improvements. Too numerous to mention individually. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits) net/mlx4_en: Add support for destination MAC in steering rules net/mlx4_en: Use generic etherdevice.h functions. net: ethtool: Add destination MAC address to flow steering API bridge: add support of adding and deleting mdb entries bridge: notify mdb changes via netlink ndisc: Unexport ndisc_{build,send}_skb(). uapi: add missing netconf.h to export list pkt_sched: avoid requeues if possible solos-pci: fix double-free of TX skb in DMA mode bnx2: Fix accidental reversions. bna: Driver Version Updated to 3.1.2.1 bna: Firmware update bna: Add RX State bna: Rx Page Based Allocation bna: TX Intr Coalescing Fix bna: Tx and Rx Optimizations bna: Code Cleanup and Enhancements ath9k: check pdata variable before dereferencing it ath5k: RX timestamp is reported at end of frame ath9k_htc: RX timestamp is reported at end of frame ...	2012-12-12 18:07:07 -08:00
Ben Hutchings	c772dde343	bonding: Fix check for ethtool get_link operation support Since commit `2c60db0370` ('net: provide a default dev->ethtool_ops') all devices have a non-null ethtool_ops. Test only dev->ethtool_ops->get_link in both places where we care. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-07 14:35:35 -05:00
nikolay@redhat.com	90fb6250c5	bonding: make arp_ip_target parameter checks consistent with sysfs The module can be loaded with arp_ip_target="255.255.255.255" which makes it impossible to remove as the function in sysfs checks for that value, so we make the parameter checks consistent with sysfs. v2: Fix formatting v3: Make description text < 75 columns Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-29 13:13:15 -05:00
nikolay@redhat.com	fbb0c41b81	bonding: fix miimon and arp_interval delayed work race conditions First I would give three observations which will be used later. Observation 1: if (delayed_work_pending(wq)) cancel_delayed_work(wq) This usage is wrong because the pending bit is cleared just before the work's fn is executed and if the function re-arms itself we might end up with the work still running. It's safe to call cancel_delayed_work_sync() even if the work is not queued at all. Observation 2: Use of INIT_DELAYED_WORK() Work needs to be initialized only once prior to (de/en)queueing. Observation 3: IFF_UP is set only after ndo_open is called Related race conditions: 1. Race between bonding_store_miimon() and bonding_store_arp_interval() Because of Obs.1 we can end up having both works enqueued. 2. Multiple races with INIT_DELAYED_WORK() Since the works are not protected by anything between INIT_DELAYED_WORK() and calls to (en/de)queue it is possible for races between the following functions: (races are also possible between the calls to INIT_DELAYED_WORK() and workqueue code) bonding_store_miimon() - bonding_store_arp_interval(), bond_close(), bond_open(), enqueued functions bonding_store_arp_interval() - bonding_store_miimon(), bond_close(), bond_open(), enqueued functions 3. By Obs.1 we need to change bond_cancel_all() Bugs 1 and 2 are fixed by moving all work initializations in bond_open which by Obs. 2 and Obs. 3 and the fact that we make sure that all works are cancelled in bond_close(), is guaranteed not to have any work enqueued. Also RTNL lock is now acquired in bonding_store_miimon/arp_interval so they can't race with bond_close and bond_open. The opposing work is cancelled only if the IFF_UP flag is set and it is cancelled unconditionally. The opposing work is already cancelled if the interface is down so no need to cancel it again. This way we don't need new synchronizations for the bonding workqueue. These bugs (and fixes) are tied together and belong in the same patch. Note: I have left 1 line intentionally over 80 characters (84) because I didn't like how it looks broken down. If you'd prefer it otherwise, then simply break it. v2: Make description text < 75 columns Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-29 13:13:15 -05:00
Michal Kubeček	4e591b93d5	bonding: in balance-rr mode, set curr_active_slave only if it is up If all slaves of a balance-rr bond with ARP monitor are enslaved with down link state, bond keeps down state even after slaves go up. This is caused by bond_enslave() setting curr_active_slave to first slave not taking into account its link state. As bond_loadbalance_arp_mon() uses curr_active_slave to identify whether slave's down->up transition should update bond's link state, bond stays down even if slaves are up (until first slave goes from up to down at least once). Before commit `f31c7937` "bonding: start slaves with link down for ARP monitor", this was masked by slaves always starting in UP state with ARP monitor (and MII monitor not relying on curr_active_slave being NULL if there is no slave up). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-28 11:28:45 -05:00
Sarveshwar Bandi	0e376bd0b7	bonding: Bonding driver does not consider the gso_max_size/gso_max_segs setting of slave devices. Patch sets the lowest gso_max_size and gso_max_segs values of the slave devices during enslave and detach. Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-21 11:50:31 -05:00
Jiri Pirko	55462cf30a	vlan: fix bond/team enslave of vlan challenged slave/port In vlan_uses_dev() check for number of vlan devs rather than existence of vlan_info. The reason is that vlan id 0 is there without appropriate vlan dev on it by default which prevented from enslaving vlan challenged dev. Reported-by: Jon Stanley <jstanley@rmrf.net> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-16 14:41:46 -04:00
Eric Dumazet	49ee49202b	bonding: set qdisc_tx_busylock to avoid LOCKDEP splat If a qdisc is installed on a bonding device, its possible to get following lockdep splat under stress : ============================================= [ INFO: possible recursive locking detected ] 3.6.0+ #211 Not tainted --------------------------------------------- ping/4876 is trying to acquire lock: (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830 but task is already holding lock: (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock); lock(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock); * DEADLOCK * May be due to missing lock nesting notation 6 locks held by ping/4876: #0: (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff815e5030>] raw_sendmsg+0x600/0xc30 #1: (rcu_read_lock_bh){.+....}, at: [<ffffffff815ba4bd>] ip_finish_output+0x12d/0x870 #2: (rcu_read_lock_bh){.+....}, at: [<ffffffff8157a0b0>] dev_queue_xmit+0x0/0x830 #3: (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830 #4: (&bond->lock){++.?..}, at: [<ffffffffa02128c1>] bond_start_xmit+0x31/0x4b0 [bonding] #5: (rcu_read_lock_bh){.+....}, at: [<ffffffff8157a0b0>] dev_queue_xmit+0x0/0x830 stack backtrace: Pid: 4876, comm: ping Not tainted 3.6.0+ #211 Call Trace: [<ffffffff810a0145>] __lock_acquire+0x715/0x1b80 [<ffffffff810a256b>] ? mark_held_locks+0x9b/0x100 [<ffffffff810a1bf2>] lock_acquire+0x92/0x1d0 [<ffffffff8157a191>] ? dev_queue_xmit+0xe1/0x830 [<ffffffff81726b7c>] _raw_spin_lock+0x3c/0x50 [<ffffffff8157a191>] ? dev_queue_xmit+0xe1/0x830 [<ffffffff8106264d>] ? rcu_read_lock_bh_held+0x5d/0x90 [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830 [<ffffffff8157a0b0>] ? netdev_pick_tx+0x570/0x570 [<ffffffffa0212a6a>] bond_start_xmit+0x1da/0x4b0 [bonding] [<ffffffff815796d0>] dev_hard_start_xmit+0x240/0x6b0 [<ffffffff81597c6e>] sch_direct_xmit+0xfe/0x2a0 [<ffffffff8157a249>] dev_queue_xmit+0x199/0x830 [<ffffffff8157a0b0>] ? netdev_pick_tx+0x570/0x570 [<ffffffff815ba96f>] ip_finish_output+0x5df/0x870 [<ffffffff815ba4bd>] ? ip_finish_output+0x12d/0x870 [<ffffffff815bb964>] ip_output+0x54/0xf0 [<ffffffff815bad48>] ip_local_out+0x28/0x90 [<ffffffff815bc444>] ip_send_skb+0x14/0x50 [<ffffffff815bc4b2>] ip_push_pending_frames+0x32/0x40 [<ffffffff815e536a>] raw_sendmsg+0x93a/0xc30 [<ffffffff8128d570>] ? selinux_file_send_sigiotask+0x1f0/0x1f0 [<ffffffff8109ddb4>] ? __lock_is_held+0x54/0x80 [<ffffffff815f6730>] ? inet_recvmsg+0x220/0x220 [<ffffffff8109ddb4>] ? __lock_is_held+0x54/0x80 [<ffffffff815f6855>] inet_sendmsg+0x125/0x240 [<ffffffff815f6730>] ? inet_recvmsg+0x220/0x220 [<ffffffff8155cddb>] sock_sendmsg+0xab/0xe0 [<ffffffff810a1650>] ? lock_release_non_nested+0xa0/0x2e0 [<ffffffff810a1650>] ? lock_release_non_nested+0xa0/0x2e0 [<ffffffff8155d18c>] __sys_sendmsg+0x37c/0x390 [<ffffffff81195b2a>] ? fsnotify+0x2ca/0x7e0 [<ffffffff811958e8>] ? fsnotify+0x88/0x7e0 [<ffffffff81361f36>] ? put_ldisc+0x56/0xd0 [<ffffffff8116f98a>] ? fget_light+0x3da/0x510 [<ffffffff8155f6c4>] sys_sendmsg+0x44/0x80 [<ffffffff8172fc22>] system_call_fastpath+0x16/0x1b Avoid this problem using a distinct lock_class_key for bonding devices. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-04 15:53:48 -04:00
Jiri Bohac	da210f5590	bonding: add some slack to arp monitoring time limits Currently, all the time limits in the bonding ARP monitor are in multiples of arp_interval -- the time interval at which the ARP monitor is periodically scheduled. With a fast network round-trip and a little scheduling latency of the ARP monitor work, a limit of ndelta_in_ticks may effectively mean (n-1)delta_in_ticks. This is fatal in case of n==1 (the link will stay down forever) and makes the behaviour non-deterministic in all the other cases. Add a delta_in_ticks/2 time slack to all the time limits. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-31 16:37:12 -04:00
John Eaglesham	6b923cb718	bonding: support for IPv6 transmit hashing Currently the "bonding" driver does not support load balancing outgoing traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4) are currently supported; this patch adds transmit hashing for IPv6 (and TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the bonding driver. In addition, bounds checking has been added to all transmit hashing functions. The algorithm chosen (xor'ing the bottom three quads of the source and destination addresses together, then xor'ing each byte of that result into the bottom byte, finally xor'ing with the last bytes of the MAC addresses) was selected after testing almost 400,000 unique IPv6 addresses harvested from server logs. This algorithm had the most even distribution for both big- and little-endian architectures while still using few instructions. Its behavior also attempts to closely match that of the IPv4 algorithm. The IPv6 flow label was intentionally not included in the hash as it appears to be unset in the vast majority of IPv6 traffic sampled, and the current algorithm not using the flow label already offers a very even distribution. Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets, ie, they are not balanced based on layer 4 information. Additionally, IPv6 packets with intermediate headers are not balanced based on layer 4 information. In practice these intermediate headers are not common and this should not cause any problems, and the alternative (a packet-parsing loop and look-up table) seemed slow and complicated for little gain. Tested-by: John Eaglesham <linux@8192.net> Signed-off-by: John Eaglesham <linux@8192.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-22 22:49:30 -07:00
David S. Miller	1304a7343b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2012-08-22 14:21:38 -07:00
Amerigo Wang	e15c3c2294	netpoll: check netpoll tx status on the right device Although this doesn't matter actually, because netpoll_tx_running() doesn't use the parameter, the code will be more readable. For team_dev_queue_xmit() we have to move it down to avoid compile errors. Cc: David Miller <davem@davemloft.net> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:33:32 -07:00
Amerigo Wang	38e6bc185d	netpoll: make __netpoll_cleanup non-block Like the previous patch, slave_disable_netpoll() and __netpoll_cleanup() may be called with read_lock() held too, so we should make them non-block, by moving the cleanup and kfree() to call_rcu_bh() callbacks. Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:33:30 -07:00
Amerigo Wang	47be03a28c	netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup() slave_enable_netpoll() and __netpoll_setup() may be called with read_lock() held, so should use GFP_ATOMIC to allocate memory. Eric suggested to pass gfp flags to __netpoll_setup(). Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:33:30 -07:00
Amerigo Wang	b7bc2a5b5b	net: remove netdev_bonding_change() I don't see any benifits to use netdev_bonding_change() than using call_netdevice_notifiers() directly. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:28:24 -07:00
Jiri Pirko	df4ab5b3c2	net: rename bond_queue_mapping to slave_dev_queue_mapping As this is going to be used not only by bonding. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-20 11:07:00 -07:00
Jiri Pirko	d40156aa5e	rtnl: allow to specify different num for rx and tx queue count Also cut out unused function parameters and possible err in return value. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-20 11:06:59 -07:00
Eric Dumazet	b6fe83e952	bonding: refine IFF_XMIT_DST_RELEASE capability Some workloads greatly benefit of IFF_XMIT_DST_RELEASE capability on output net device, avoiding dirtying dst refcount. bonding currently disables IFF_XMIT_DST_RELEASE unconditionally. If all slaves have the IFF_XMIT_DST_RELEASE bit set, then bonding master can also have it in its priv_flags Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-18 09:31:25 -07:00
Jiri Pirko	30fdd8a082	netpoll: move np->dev and np->dev_name init into __netpoll_setup() Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-17 09:02:36 -07:00
David S. Miller	04c9f416e3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/batman-adv/bridge_loop_avoidance.c net/batman-adv/bridge_loop_avoidance.h net/batman-adv/soft-interface.c net/mac80211/mlme.c With merge help from Antonio Quartulli (batman-adv) and Stephen Rothwell (drivers/net/usb/qmi_wwan.c). The net/mac80211/mlme.c conflict seemed easy enough, accounting for a conversion to some new tracing macros. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-10 23:56:33 -07:00
Eric W. Biederman	a64d49c3dd	bonding: Manage /proc/net/bonding/ entries from the netdev events It was recently reported that moving a bonding device between network namespaces causes warnings from /proc. It turns out after the move we were trying to add and to remove the /proc/net/bonding entries from the wrong network namespace. Move the bonding /proc registration code into the NETDEV_REGISTER and NETDEV_UNREGISTER events where the proc registration and unregistration will always happen at the right time. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-09 14:49:15 -07:00
Eric Dumazet	0450243096	bonding: drop_monitor aware When packets are dropped in TX path, its better to use kfree_skb() instead of dev_kfree_skb() to give proper drop_monitor events. Also move the kfree_skb() call after read_unlock() in bond_alb_xmit() and bond_xmit_activebackup() Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-13 16:00:26 -07:00
David S. Miller	43b03f1f6d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: MAINTAINERS drivers/net/wireless/iwlwifi/pcie/trans.c The iwlwifi conflict was resolved by keeping the code added in 'net' that turns off the buggy chip feature. The MAINTAINERS conflict was merely overlapping changes, one change updated all the wireless web site URLs and the other changed some GIT trees to be Johannes's instead of John's. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-12 21:59:18 -07:00
Eric Dumazet	de063b7040	bonding: remove packet cloning in recv_probe() Cloning all packets in input path have a significant cost. Use skb_header_pointer()/skb_copy_bits() instead of pskb_may_pull() so that recv_probe handlers (bond_3ad_lacpdu_recv / bond_arp_rcv / rlb_arp_recv ) dont touch input skb. bond_handle_frame() can avoid the skb_clone()/dev_kfree_skb() Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Cc: Maciej Żenczykowski <maze@google.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-12 18:51:09 -07:00
Eric Dumazet	5ee31c6898	bonding: Fix corrupted queue_mapping In the transmit path of the bonding driver, skb->cb is used to stash the skb->queue_mapping so that the bonding device can set its own queue mapping. This value becomes corrupted since the skb->cb is also used in __dev_xmit_skb. When transmitting through bonding driver, bond_select_queue is called from dev_queue_xmit. In bond_select_queue the original skb->queue_mapping is copied into skb->cb (via bond_queue_mapping) and skb->queue_mapping is overwritten with the bond driver queue. Subsequently in dev_queue_xmit, __dev_xmit_skb is called which writes the packet length into skb->cb, thereby overwriting the stashed queue mappping. In bond_dev_queue_xmit (called from hard_start_xmit), the queue mapping for the skb is set to the stashed value which is now the skb length and hence is an invalid queue for the slave device. If we want to save skb->queue_mapping into skb->cb[], best place is to add a field in struct qdisc_skb_cb, to make sure it wont conflict with other layers (eg : Qdiscc, Infiniband...) This patchs also makes sure (struct qdisc_skb_cb)->data is aligned on 8 bytes : netem qdisc for example assumes it can store an u64 in it, without misalignment penalty. Note : we only have 20 bytes left in (struct qdisc_skb_cb)->data[]. The largest user is CHOKe and it fills it. Based on a previous patch from Tom Herbert. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Tom Herbert <therbert@google.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Roland Dreier <roland@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-12 15:29:21 -07:00
David S. Miller	028940342a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2012-05-16 22:17:37 -04:00
Joe Perches	2e42e4747e	drivers/net: Convert compare_ether_addr to ether_addr_equal Use the new bool function ether_addr_equal to add some clarity and reduce the likelihood for misuse of compare_ether_addr for sorting. Done via cocci script: $ cat compare_ether_addr.cocci @@ expression a,b; @@ - !compare_ether_addr(a, b) + ether_addr_equal(a, b) @@ expression a,b; @@ - compare_ether_addr(a, b) + !ether_addr_equal(a, b) @@ expression a,b; @@ - !ether_addr_equal(a, b) == 0 + ether_addr_equal(a, b) @@ expression a,b; @@ - !ether_addr_equal(a, b) != 0 + !ether_addr_equal(a, b) @@ expression a,b; @@ - ether_addr_equal(a, b) == 0 + !ether_addr_equal(a, b) @@ expression a,b; @@ - ether_addr_equal(a, b) != 0 + ether_addr_equal(a, b) @@ expression a,b; @@ - !!ether_addr_equal(a, b) + ether_addr_equal(a, b) Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-10 23:33:01 -04:00
Jiri Bohac	13a8e0c8cd	bonding: don't increase rx_dropped after processing LACPDUs Since commit `3aba891d`, bonding processes LACP frames (802.3ad mode) with bond_handle_frame(). Currently a copy of the skb is made and the original is left to be processed by other rx_handlers and the rest of the network stack by returning RX_HANDLER_ANOTHER. As there is no protocol handler for PKT_TYPE_LACPDU, the frame is dropped and dev->rx_dropped increased. Fix this by making bond_handle_frame() return RX_HANDLER_CONSUMED if bonding has processed the LACP frame. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-10 23:30:01 -04:00
Rick Jones	13b95fb714	bonding: bond_update_speed_duplex() can return void since no callers check its return As none of the callers of bond_update_speed_duplex (need to) check its return value, there is little point in it returning anything. Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-27 00:03:35 -04:00
Michal Kubeček	f31c7937c2	bonding: start slaves with link down for ARP monitor Initialize slave device link state as down if ARP monitor is active and net_carrier_ok() returns zero. Also shift initial value of its last_arp_tx so that it doesn't immediately cause fake detection of "up" state. When ARP monitoring is used, initializing the slave device with up link state can cause ARP monitor to detect link failure before the device is really up (with igb driver, this can take more than two seconds). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-19 15:25:02 -04:00
David S. Miller	64d683c582	bonding: Fixup get_tx_queue() op second arg type. I missed this when fixing up the warning in the previous commit. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-13 15:02:35 -04:00
stephen hemminger	efacb309b5	rtnetlink & bonding: change args got get_tx_queues Change get_tx_queues, drop unsused arg/return value real_tx_queues, and use return by value (with error) rather than call by reference. Probably bonding should just change to LLTX and the whole get_tx_queues API could disappear! Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-13 13:31:00 -04:00
Veaceslav Falico	5a4309746c	bonding: properly unset current_arp_slave on slave link up When a slave comes up, we're unsetting the current_arp_slave without removing active flags from it, which can lead to situations where we have more than one slave with active flags in active-backup mode. To avoid this situation we must remove the active flags from a slave before removing it as a current_arp_slave. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-05 19:07:59 -04:00
Shlomo Pongratz	234bcf8a49	net/bonding: correctly proxy slave neigh param setup ndo function The current implemenation was buggy for slaves who use ndo_neigh_setup, since the networking stack invokes the bonding device ndo entry (from neigh_params_alloc) before any devices are enslaved, and the bonding driver can't further delegate the call at that point in time. As a result when bonding IPoIB devices, the neigh_cleanup hasn't been called. Fix that by deferring the actual call into the slave ndo_neigh_setup from the time the bonding neigh_setup is called. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-04 18:11:12 -04:00
Shlomo Pongratz	2af73d4b2a	net/bonding: emit address change event also in bond_release commit `7d26bb103c` "bonding: emit event when bonding changes MAC" didn't take care to emit the NETDEV_CHANGEADDR event in bond_release, where bonding actually changes the mac address (to all zeroes). As a result the neighbours aren't deleted by the core networking code (which does so upon getting that event). Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-04 18:11:12 -04:00
Linus Torvalds	ed359a3b7b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Provide device string properly for USB i2400m wimax devices, also don't OOPS when providing firmware string. From Phil Sutter. 2) Add support for sh_eth SH7734 chips, from Nobuhiro Iwamatsu. 3) Add another device ID to USB zaurus driver, from Guan Xin. 4) Loop index start in pool vector iterator is wrong causing MAC to not get configured in bnx2x driver, fix from Dmitry Kravkov. 5) EQL driver assumes HZ=100, fix from Eric Dumazet. 6) Now that skb_add_rx_frag() can specify the truesize increment separately, do so in f_phonet and cdc_phonet, also from Eric Dumazet. 7) virtio_net accidently uses net_ratelimit() not only on the kernel warning but also the statistic bump, fix from Rick Jones. 8) ip_route_input_mc() uses fixed init_net namespace, oops, use dev_net(dev) instead. Fix from Benjamin LaHaise. 9) dev_forward_skb() needs to clear the incoming interface index of the SKB so that it looks like a new incoming packet, also from Benjamin LaHaise. 10) iwlwifi mistakenly initializes a channel entry as 2GHZ instead of 5GHZ, fix from Stanislav Yakovlev. 11) Missing kmalloc() return value checks in orinoco, from Santosh Nayak. 12) ath9k doesn't check for HT capabilities in the right way, it is checking ht_supported instead of the ATH9K_HW_CAP_HT flag. Fix from Sujith Manoharan. 13) Fix x86 BPF JIT emission of 16-bit immediate field of AND instructions, from Feiran Zhuang. 14) Avoid infinite loop in GARP code when registering sysfs entries. From David Ward. 15) rose protocol uses memcpy instead of memcmp in a device address comparison, oops. Fix from Daniel Borkmann. 16) Fix build of lpc_eth due to dev_hw_addr_rancom() interface being renamed to eth_hw_addr_random(). From Roland Stigge. 17) Make ipv6 RTM_GETROUTE interpret RTA_IIF attribute the same way that ipv4 does. Fix from Shmulik Ladkani. 18) via-rhine has an inverted bit test, causing suspend/resume regressions. Fix from Andreas Mohr. 19) RIONET assumes 4K page size, fix from Akinobu Mita. 20) Initialization of imask register in sky2 is buggy, because bits are "or'd" into an uninitialized local variable. Fix from Lino Sanfilippo. 21) Fix FCOE checksum offload handling, from Yi Zou. 22) Fix VLAN processing regression in e1000, from Jiri Pirko. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits) sky2: dont overwrite settings for PHY Quick link tg3: Fix 5717 serdes powerdown problem net: usb: cdc_eem: fix mtu net: sh_eth: fix endian check for architecture independent usb/rtl8150 : Remove duplicated definitions rionet: fix page allocation order of rionet_active via-rhine: fix wait-bit inversion. ipv6: Fix RTM_GETROUTE's interpretation of RTA_IIF to be consistent with ipv4 net: lpc_eth: Fix rename of dev_hw_addr_random net/netfilter/nfnetlink_acct.c: use linux/atomic.h rose_dev: fix memcpy-bug in rose_set_mac_address Fix non TBI PHY access; a bad merge undid bug fix in a previous commit. net/garp: avoid infinite loop if attribute already exists x86 bpf_jit: fix a bug in emitting the 16-bit immediate operand of AND bonding: emit event when bonding changes MAC mac80211: fix oper channel timestamp updation ath9k: Use HW HT capabilites properly MAINTAINERS: adding maintainer for ipw2x00 net: orinoco: add error handling for failed kmalloc(). net/wireless: ipw2x00: fix a typo in wiphy struct initilization ...	2012-04-02 17:53:39 -07:00
Weiping Pan	7d26bb103c	bonding: emit event when bonding changes MAC When a bonding device is configured with fail_over_mac=active, we expect to see the MAC address of the new active slave as the source MAC address after failover. But we see that the source MAC address is the MAC address of previous active slave. Emit NETDEV_CHANGEADDR event when bonding changes its MAC address, in order to let arp_netdev_event flush neighbour cache and route cache. How to reproduce this bug ? -----------hostB---------------- hostA ----- switch ---\|-- eth0--bond0(192.168.100.2/24)\| (192.168.100.1/24 \--\|-- eth1-/ \| -------------------------------- 1 on hostB, modprobe bonding mode=1 miimon=500 fail_over_mac=active downdelay=1000 num_grat_arp=1 ifconfig bond0 192.168.100.2/24 up ifenslave bond0 eth0 ifenslave bond0 eth1 then eth0 is the active slave, and MAC of bond0 is MAC of eth0. 2 on hostA, ping 192.168.100.2 3 on hostB, tcpdump -i bond0 -p icmp -XXX you will see bond0 uses MAC of eth0 as source MAC in icmp reply. 4 on hostB, ifconfig eth0 down tcpdump -i bond0 -p icmp -XXX (just keep it running in step 3) you will see first bond0 uses MAC of eth1 as source MAC in icmp reply, then it will use MAC of eth0 as source MAC. Signed-off-by: Weiping Pan <wpan@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-03-29 18:11:57 -04:00
Linus Torvalds	0195c00244	Disintegrate and delete asm/system.h -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIVAwUAT3NKzROxKuMESys7AQKElw/+JyDxJSlj+g+nymkx8IVVuU8CsEwNLgRk 8KEnRfLhGtkXFLSJYWO6jzGo16F8Uqli1PdMFte/wagSv0285/HZaKlkkBVHdJ/m u40oSjgT013bBh6MQ0Oaf8pFezFUiQB5zPOA9QGaLVGDLXCmgqUgd7exaD5wRIwB ZmyItjZeAVnDfk1R+ZiNYytHAi8A5wSB+eFDCIQYgyulA1Igd1UnRtx+dRKbvc/m rWQ6KWbZHIdvP1ksd8wHHkrlUD2pEeJ8glJLsZUhMm/5oMf/8RmOCvmo8rvE/qwl eDQ1h4cGYlfjobxXZMHqAN9m7Jg2bI946HZjdb7/7oCeO6VW3FwPZ/Ic75p+wp45 HXJTItufERYk6QxShiOKvA+QexnYwY0IT5oRP4DrhdVB/X9cl2MoaZHC+RbYLQy+ /5VNZKi38iK4F9AbFamS7kd0i5QszA/ZzEzKZ6VMuOp3W/fagpn4ZJT1LIA3m4A9 Q0cj24mqeyCfjysu0TMbPtaN+Yjeu1o1OFRvM8XffbZsp5bNzuTDEvviJ2NXw4vK 4qUHulhYSEWcu9YgAZXvEWDEM78FXCkg2v/CrZXH5tyc95kUkMPcgG+QZBB5wElR FaOKpiC/BuNIGEf02IZQ4nfDxE90QwnDeoYeV+FvNj9UEOopJ5z5bMPoTHxm4cCD NypQthI85pc= =G9mT -----END PGP SIGNATURE----- Merge tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system Pull "Disintegrate and delete asm/system.h" from David Howells: "Here are a bunch of patches to disintegrate asm/system.h into a set of separate bits to relieve the problem of circular inclusion dependencies. I've built all the working defconfigs from all the arches that I can and made sure that they don't break. The reason for these patches is that I recently encountered a circular dependency problem that came about when I produced some patches to optimise get_order() by rewriting it to use ilog2(). This uses bitops - and on the SH arch asm/bitops.h drags in asm-generic/get_order.h by a circuituous route involving asm/system.h. The main difficulty seems to be asm/system.h. It holds a number of low level bits with no/few dependencies that are commonly used (eg. memory barriers) and a number of bits with more dependencies that aren't used in many places (eg. switch_to()). These patches break asm/system.h up into the following core pieces: (1) asm/barrier.h Move memory barriers here. This already done for MIPS and Alpha. (2) asm/switch_to.h Move switch_to() and related stuff here. (3) asm/exec.h Move arch_align_stack() here. Other process execution related bits could perhaps go here from asm/processor.h. (4) asm/cmpxchg.h Move xchg() and cmpxchg() here as they're full word atomic ops and frequently used by atomic_xchg() and atomic_cmpxchg(). (5) asm/bug.h Move die() and related bits. (6) asm/auxvec.h Move AT_VECTOR_SIZE_ARCH here. Other arch headers are created as needed on a per-arch basis." Fixed up some conflicts from other header file cleanups and moving code around that has happened in the meantime, so David's testing is somewhat weakened by that. We'll find out anything that got broken and fix it.. * tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits) Delete all instances of asm/system.h Remove all #inclusions of asm/system.h Add #includes needed to permit the removal of asm/system.h Move all declarations of free_initmem() to linux/mm.h Disintegrate asm/system.h for OpenRISC Split arch_align_stack() out from asm-generic/system.h Split the switch_to() wrapper out of asm-generic/system.h Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h Create asm-generic/barrier.h Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h Disintegrate asm/system.h for Xtensa Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt] Disintegrate asm/system.h for Tile Disintegrate asm/system.h for Sparc Disintegrate asm/system.h for SH Disintegrate asm/system.h for Score Disintegrate asm/system.h for S390 Disintegrate asm/system.h for PowerPC Disintegrate asm/system.h for PA-RISC Disintegrate asm/system.h for MN10300 ...	2012-03-28 15:58:21 -07:00
David Howells	9ffc93f203	Remove all #inclusions of asm/system.h Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\sinclude\s<asm/system[.]h>.\n!!' `grep -Irl '^#\sinclude\s<asm/system[.]h>' ` Signed-off-by: David Howells <dhowells@redhat.com>	2012-03-28 18:30:03 +01:00
Andy Gospodarek	eaddcd7690	bonding: remove entries for master_ip and vlan_ip and query devices instead The following patch aimed to resolve an issue where secondary, tertiary, etc. addresses added to bond interfaces could overwrite the bond->master_ip and vlan_ip values. commit `917fbdb32f` Author: Henrik Saavedra Persson <henrik.e.persson@ericsson.com> Date: Wed Nov 23 23:37:15 2011 +0000 bonding: only use primary address for ARP That patch was good because it prevented bonds using ARP monitoring from sending frames with an invalid source IP address. Unfortunately, it didn't always work as expected. When using an ioctl (like ifconfig does) to set the IP address and netmask, 2 separate ioctls are actually called to set the IP and netmask if the mask chosen doesn't match the standard mask for that class of address. The first ioctl did not have a mask that matched the one in the primary address and would still cause the device address to be overwritten. The second ioctl that was called to set the mask would then detect as secondary and ignored, but the damage was already done. This was not an issue when using an application that used netlink sockets as the setting of IP and netmask came down at once. The inconsistent behavior between those two interfaces was something that needed to be resolved. While I was thinking about how I wanted to resolve this, Ralf Zeidler came with a patch that resolved this on a RHEL kernel by keeping a full shadow of the entries in dev->ifa_list for the bonding device and vlan devices in the bonding driver. I didn't like the duplication of the list as I want to see the 'bonding' struct and code shrink rather than grow, but liked the general idea. As the Subject indicates this patch drops the master_ip and vlan_ip elements from the 'bonding' and 'vlan_entry' structs, respectively. This can be done because a device's address-list is now traversed to determine the optimal source IP address for ARP requests and for checks to see if the bonding device has a particular IP address. This code could have all be contained inside the bonding driver, but it made more sense to me to EXPORT and call inet_confirm_addr since it did exactly what was needed. I tested this and a backported patch and everything works as expected. Ralf also helped with verification of the backported patch. Thanks to Ralf for all his help on this. v2: Whitespace and organizational changes based on suggestions from Jay Vosburgh and Dave Miller. v3: Fixup incorrect usage of rcu_read_unlock based on Dave Miller's suggestion. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> CC: Ralf Zeidler <ralf.zeidler@nsn.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-03-22 22:36:17 -04:00
Peter Pan(潘卫平)	1c3ac4289a	bonding: send igmp report for its master Liang Zheng(lzheng@redhat.com) found that in the following topo, bonding does not send igmp report when we trigger a fail-over of bonding. eth0-- \|-- bond0 -- br0 eth1-- modprobe bonding mode=1 miimon=100 resend_igmp=10 ifconfig bond0 up ifenslave bond0 eth0 eth1 brctl addbr br0 ifconfig br0 192.168.100.2/24 up brctl addif br0 bond0 Add 192.168.100.2(br0) into a multicast group, like 224.10.10.10, then trigger a fali-over in bonding. You can see that parameter "resend_igmp" does not work. The reason is that when we add br0 into a multicast group, it does not propagate multicast knowledge down to its ports. If we choose to propagate multicast knowledge down to all ports for bridge, then we have to track every change that is done to bridge, and keep a backup for all ports. It is hard to track, I think. Instead I choose to modify bonding to send igmp report for its master. Changelog: V2: correct comments V3: move this check into bond_resend_igmp_join_requests() V4: only send igmp reports if bond is enslaved to a bridge Signed-off-by: Weiping Pan <panweiping3@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-03-19 18:02:05 -04:00
stephen hemminger	f7d9821a6a	bonding: fix error handling if slave is busy (v2) If slave device already has a receive handler registered, then the error unwind of bonding device enslave function is broken. The following will leave a pointer to freed memory in the slave device list, causing a later kernel panic. # modprobe dummy # ip li add dummy0-1 link dummy0 type macvlan # modprobe bonding # echo +dummy0 >/sys/class/net/bond0/bonding/slaves The fix is to detach the slave (which removes it from the list) in the unwind path. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-03 12:49:16 -05:00
Jiri Pirko	87002b03ba	net: introduce vlan_vid_[add/del] and use them instead of direct [add/kill]_vid ndo calls This patch adds wrapper for ndo_vlan_rx_add_vid/ndo_vlan_rx_kill_vid functions. Check for NETIF_F_HW_VLAN_FILTER feature is done in this wrapper. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-08 19:52:42 -05:00
Jiri Pirko	8e586137e6	net: make vlan ndo_vlan_rx_[add/kill]_vid return error value Let caller know the result of adding/removing vlan id to/from vlan filter. In some drivers I make those functions to just return 0. But in those where there is able to see if hw setup went correctly, return value is set appropriately. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-08 19:52:37 -05:00
David S. Miller	b3613118eb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2011-12-02 13:49:21 -05:00
Henrik Saavedra Persson	917fbdb32f	bonding: only use primary address for ARP Only use the primary address of the bond device for master_ip. This will prevent changing the ARP source address in Active-Backup mode whenever a secondry address is added to the bond device. Signed-off-by: Henrik Saavedra Persson <henrik.e.persson@ericsson.com> Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@drr.davemloft.net>	2011-11-30 22:59:11 -05:00
Michał Mirosław	34324dc2bf	net: remove NETIF_F_NO_CSUM feature bit Only distinct use is checking if NETIF_F_NOCACHE_COPY should be enabled by default. The check heuristics is altered a bit here, so it hits other people than before. The default shouldn't be trusted for performance-critical cases anyway. For all other uses NETIF_F_NO_CSUM is equivalent to NETIF_F_HW_CSUM. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-16 17:43:12 -05:00
Michał Mirosław	c8f44affb7	net: introduce and use netdev_features_t for device features sets v2: add couple missing conversions in drivers split unexporting netdev_fix_features() implemented %pNF convert sock::sk_route_(no?)caps Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-16 17:43:10 -05:00
Dan Carpenter	589665f5a6	bonding: comparing a u8 with -1 is always false slave->duplex is a u8 type so the in bond_info_show_slave() when we check "if (slave->duplex == -1)", it's always false. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-04 18:37:23 -04:00
Weiping Pan	98f41f694f	bonding:update speed/duplex for NETDEV_CHANGE Zheng Liang(lzheng@redhat.com) found a bug that if we config bonding with arp monitor, sometimes bonding driver cannot get the speed and duplex from its slaves, it will assume them to be 100Mb/sec and Full, please see /proc/net/bonding/bond0. But there is no such problem when uses miimon. (Take igb for example) I find that the reason is that after dev_open() in bond_enslave(), bond_update_speed_duplex() will call igb_get_settings() , but in that function, it runs ethtool_cmd_speed_set(ecmd, -1); ecmd->duplex = -1; because igb get an error value of status. So even dev_open() is called, but the device is not really ready to get its settings. Maybe it is safe for us to call igb_get_settings() only after this message shows up, that is "igb: p4p1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX". So I prefer to update the speed and duplex for a slave when reseices NETDEV_CHANGE/NETDEV_UP event. Changelog V2: 1 remove the "fake 100/Full" logic in bond_update_speed_duplex(), set speed and duplex to -1 when it gets error value of speed and duplex. 2 delete the warning in bond_enslave() if bond_update_speed_duplex() returns error. 3 make bond_info_show_slave() handle bad values of speed and duplex. Signed-off-by: Weiping Pan <wpan@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-01 17:52:49 -04:00
Jay Vosburgh	e6d265e850	bonding: eliminate bond_close race conditions This patch resolves two sets of race conditions. Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> reported the first, as follows: The bond_close() calls cancel_delayed_work() to cancel delayed works. It, however, cannot cancel works that were already queued in workqueue. The bond_open() initializes work->data, and proccess_one_work() refers get_work_cwq(work)->wq->flags. The get_work_cwq() returns NULL when work->data has been initialized. Thus, a panic occurs. He included a patch that converted the cancel_delayed_work calls in bond_close to flush_delayed_work_sync, which eliminated the above problem. His patch is incorporated, at least in principle, into this patch. In this patch, we use cancel_delayed_work_sync in place of flush_delayed_work_sync, and also convert bond_uninit in addition to bond_close. This conversion to _sync, however, opens new races between bond_close and three periodically executing workqueue functions: bond_mii_monitor, bond_alb_monitor and bond_activebackup_arp_mon. The race occurs because bond_close and bond_uninit are always called with RTNL held, and these workqueue functions may acquire RTNL to perform failover-related activities. If bond_close or bond_uninit is waiting in cancel_delayed_work_sync, deadlock occurs. These deadlocks are resolved by having the workqueue functions acquire RTNL conditionally. If the rtnl_trylock() fails, the functions reschedule and return immediately. For the cases that are attempting to perform link failover, a delay of 1 is used; for the other cases, the normal interval is used (as those activities are not as time critical). Additionally, the bond_mii_monitor function now stores the delay in a variable (mimicing the structure of activebackup_arp_mon). Lastly, all of the above renders the kill_timers sentinel moot, and therefore it has been removed. Tested-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-30 03:13:14 -04:00
Maciej Żenczykowski	59fdaca9a4	net: make bonding slaves honour master's skb->priority Signed-off-by: Maciej Żenczykowski <maze@google.com> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-25 19:22:23 -04:00
David S. Miller	1805b2f048	Merge branch 'master' of ra.kernel.org:/pub/scm/linux/kernel/git/davem/net	2011-10-24 18:18:09 -04:00
Eric W. Biederman	4c22400ab6	bonding: Use a per netns implementation of /sys/class/net/bonding_masters. This fixes a network namespace misfeature that bonding_masters looked at current instead of the remembering the context where in which /sys/class/net/bonding_masters was opened in to see which network namespace to act upon. This removes the need for sysfs to handle tagged directories with untagged members allowing for a conceptually simpler sysfs implementation. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-19 19:24:15 -04:00
Mitsuo Hayasaka	4d97480b18	bonding: use local function pointer of bond->recv_probe in bond_handle_frame The bond->recv_probe is called in bond_handle_frame() when a packet is received, but bond_close() sets it to NULL. So, a panic occurs when both functions work in parallel. Why this happen: After null pointer check of bond->recv_probe, an sk_buff is duplicated and bond->recv_probe is called in bond_handle_frame. So, a panic occurs when bond_close() is called between the check and call of bond->recv_probe. Patch: This patch uses a local function pointer of bond->recv_probe in bond_handle_frame(). So, it can avoid the null pointer dereference. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: WANG Cong <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-19 00:14:22 -04:00
David S. Miller	88c5100c28	Merge branch 'master' of github.com:davem330/net Conflicts: net/batman-adv/soft-interface.c	2011-10-07 13:38:43 -04:00
Andy Gospodarek	a0db2dad09	bonding: properly stop queuing work when requested During a test where a pair of bonding interfaces using ARP monitoring were both brought up and torn down (with an rmmod) repeatedly, a panic in the timer code was noticed. I tracked this down and determined that any of the bonding functions that ran as workqueue handlers and requeued more work might not properly exit when the module was removed. There was a flag protected by the bond lock called kill_timers that is set when the interface goes down or the module is removed, but many of the functions that monitor link status now unlock the bond lock to take rtnl first. There is a chance that another CPU running the rmmod could get the lock and set kill_timers after the first check has passed. This patch does not allow any function to queue work that will make itself run unless kill_timers is not set. I also noticed while doing this work that bond_resend_igmp_join_requests did not have a check for kill_timers, so I added the needed call there as well. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Reported-by: Liang Zheng <lzheng@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-03 13:48:20 -04:00
Jiri Pirko	4bc71cb983	net: consolidate and fix ethtool_ops->get_settings calling This patch does several things: - introduces __ethtool_get_settings which is called from ethtool code and from drivers as well. Put ASSERT_RTNL there. - dev_ethtool_get_settings() is replaced by __ethtool_get_settings() - changes calling in drivers so rtnl locking is respected. In iboe_get_rate was previously ->get_settings() called unlocked. This fixes it. Also prb_calc_retire_blk_tmo() in af_packet.c had the same problem. Also fixed by calling __dev_get_by_index() instead of dev_get_by_index() and holding rtnl_lock for both calls. - introduces rtnl_lock in bnx2fc_vport_create() and fcoe_vport_create() so bnx2fc_if_create() and fcoe_if_create() are called locked as they are from other places. - use __ethtool_get_settings() in bonding code Signed-off-by: Jiri Pirko <jpirko@redhat.com> v2->v3: -removed dev_ethtool_get_settings() -added ASSERT_RTNL into __ethtool_get_settings() -prb_calc_retire_blk_tmo - use __dev_get_by_index() and lock around it and __ethtool_get_settings() call v1->v2: add missing export_symbol Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> [except FCoE bits] Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-09-15 17:32:26 -04:00
David S. Miller	823dcd2506	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net	2011-08-20 10:39:12 -07:00
Jiri Pirko	afc4b13df1	net: remove use of ndo_set_multicast_list in drivers replace it by ndo_set_rx_mode Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-17 20:22:03 -07:00
Jiri Pirko	d03462b999	bonding: use ndo_change_rx_flags callback Benefit from use of ndo_change_rx_flags in handling change of promisc and allmulti. No need to store previous state locally. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-17 20:17:47 -07:00
Peter Pan(潘卫平)	ba3211ccd0	bonding:reset backup and inactive flag of slave Eduard Sinelnikov (eduard.sinelnikov@gmail.com) found that if we change bonding mode from active backup to round robin, some slaves are still keeping "backup", and won't transmit packets. As Jay Vosburgh(fubar@us.ibm.com) pointed out that we can work around that by removing the bond_is_active_slave() check, because the "backup" flag is only meaningful for active backup mode. But if we just simply ignore the bond_is_active_slave() check, the transmission will work fine, but we can't maintain the correct value of "backup" flag for each slaves, though it is meaningless for other mode than active backup. I'd like to reset "backup" and "inactive" flag in bond_open, thus we can keep the correct value of them. As for bond_is_active_slave(), I'd like to prepare another patch to handle it. V2: Use C style comment. Move read_lock(&bond->curr_slave_lock). Replace restore with reset, for active backup mode, it means "restore", but for other modes, it means "reset". Signed-off-by: Weiping Pan <panweiping3@gmail.com> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-17 20:12:06 -07:00
Jiri Pirko	d5da4510a7	bonding: implement get_tx_queues rtnk_link_op If bonding device is created via rtnl, it is created with default number of rx/tx queues. This patch implements callback in bonding so the correct value (previously specified by bonding module param) is used. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-11 07:44:38 -07:00
Andy Gospodarek	b2730f4f84	bonding: reduce noise during init On Tue, Jul 26, 2011 at 05:40:27PM -0700, Joe Perches wrote: > On Tue, 2011-07-26 at 17:37 -0700, Jay Vosburgh wrote: > > Joe Perches <joe@perches.com> wrote: > > >I'd prefer you don't separate the format string > > >into multiple pieces. > > Why not? To me, it looks easier to read split into sections > > that don't wrap lines. > > Harder to grep for a dmesg and the > defect rate of these split formats is > typically higher than single strings > because of bad spacing between string > segments. > I noticed that you took some time back in late 2009 to 'consolidate' the split format-strings present in the bonding driver at the time and I've decided I'm fine to leave them the way they are. The main point of my patch was to change the output and I would like to get that included. Here is my updated patch... Subject: [PATCH net-next-2.6 v2] bonding: reduce noise during init Many are using sysfs to configure bonding rather than module options, so there is no need for bonding to throw this warning in normal cases. Keep the message around when debugging is enabled as it might be useful for someone desperate enough to enable debugging, but eliminate it otherwise. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-27 22:39:30 -07:00
Neil Horman	550fd08c2c	net: Audit drivers to identify those needing IFF_TX_SKB_SHARING cleared After the last patch, We are left in a state in which only drivers calling ether_setup have IFF_TX_SKB_SHARING set (we assume that drivers touching real hardware call ether_setup for their net_devices and don't hold any state in their skbs. There are a handful of drivers that violate this assumption of course, and need to be fixed up. This patch identifies those drivers, and marks them as not being able to support the safe transmission of skbs by clearning the IFF_TX_SKB_SHARING flag in priv_flags Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Karsten Keil <isdn@linux-pingi.de> CC: "David S. Miller" <davem@davemloft.net> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Patrick McHardy <kaber@trash.net> CC: Krzysztof Halasa <khc@pm.waw.pl> CC: "John W. Linville" <linville@tuxdriver.com> CC: Greg Kroah-Hartman <gregkh@suse.de> CC: Marcel Holtmann <marcel@holtmann.org> CC: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-27 22:39:30 -07:00
Jiri Pirko	cc0e407006	bonding: do vlan cleanup Now when all devices are cleaned up, bond can be cleaned up as well - remove bond->vlgrp - remove bond_vlan_rx_register - substitute necessary occurences of vlan_group_get_device Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-21 13:47:58 -07:00

1 2 3 4 5 ...

479 Commits