OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
John Eaglesham	6b923cb718	bonding: support for IPv6 transmit hashing Currently the "bonding" driver does not support load balancing outgoing traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4) are currently supported; this patch adds transmit hashing for IPv6 (and TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the bonding driver. In addition, bounds checking has been added to all transmit hashing functions. The algorithm chosen (xor'ing the bottom three quads of the source and destination addresses together, then xor'ing each byte of that result into the bottom byte, finally xor'ing with the last bytes of the MAC addresses) was selected after testing almost 400,000 unique IPv6 addresses harvested from server logs. This algorithm had the most even distribution for both big- and little-endian architectures while still using few instructions. Its behavior also attempts to closely match that of the IPv4 algorithm. The IPv6 flow label was intentionally not included in the hash as it appears to be unset in the vast majority of IPv6 traffic sampled, and the current algorithm not using the flow label already offers a very even distribution. Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets, ie, they are not balanced based on layer 4 information. Additionally, IPv6 packets with intermediate headers are not balanced based on layer 4 information. In practice these intermediate headers are not common and this should not cause any problems, and the alternative (a packet-parsing loop and look-up table) seemed slow and complicated for little gain. Tested-by: John Eaglesham <linux@8192.net> Signed-off-by: John Eaglesham <linux@8192.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-22 22:49:30 -07:00
David S. Miller	1304a7343b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2012-08-22 14:21:38 -07:00
Amerigo Wang	e15c3c2294	netpoll: check netpoll tx status on the right device Although this doesn't matter actually, because netpoll_tx_running() doesn't use the parameter, the code will be more readable. For team_dev_queue_xmit() we have to move it down to avoid compile errors. Cc: David Miller <davem@davemloft.net> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:33:32 -07:00
Amerigo Wang	38e6bc185d	netpoll: make __netpoll_cleanup non-block Like the previous patch, slave_disable_netpoll() and __netpoll_cleanup() may be called with read_lock() held too, so we should make them non-block, by moving the cleanup and kfree() to call_rcu_bh() callbacks. Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:33:30 -07:00
Amerigo Wang	47be03a28c	netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup() slave_enable_netpoll() and __netpoll_setup() may be called with read_lock() held, so should use GFP_ATOMIC to allocate memory. Eric suggested to pass gfp flags to __netpoll_setup(). Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:33:30 -07:00
Amerigo Wang	b7bc2a5b5b	net: remove netdev_bonding_change() I don't see any benifits to use netdev_bonding_change() than using call_netdevice_notifiers() directly. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:28:24 -07:00
Jiri Pirko	8a540ff9e1	bond_sysfs: use real_num_tx_queues rather than params.tx_queue Since now number of tx queues can be specified during bond instance creation and therefore it may differ from params.tx_queues, use rather real_num_tx_queues for boundary check. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-20 11:07:00 -07:00
Jiri Pirko	df4ab5b3c2	net: rename bond_queue_mapping to slave_dev_queue_mapping As this is going to be used not only by bonding. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-20 11:07:00 -07:00
Jiri Pirko	d40156aa5e	rtnl: allow to specify different num for rx and tx queue count Also cut out unused function parameters and possible err in return value. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-20 11:06:59 -07:00
Eric Dumazet	b6fe83e952	bonding: refine IFF_XMIT_DST_RELEASE capability Some workloads greatly benefit of IFF_XMIT_DST_RELEASE capability on output net device, avoiding dirtying dst refcount. bonding currently disables IFF_XMIT_DST_RELEASE unconditionally. If all slaves have the IFF_XMIT_DST_RELEASE bit set, then bonding master can also have it in its priv_flags Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-18 09:31:25 -07:00
Jiri Pirko	30fdd8a082	netpoll: move np->dev and np->dev_name init into __netpoll_setup() Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-17 09:02:36 -07:00
David S. Miller	04c9f416e3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/batman-adv/bridge_loop_avoidance.c net/batman-adv/bridge_loop_avoidance.h net/batman-adv/soft-interface.c net/mac80211/mlme.c With merge help from Antonio Quartulli (batman-adv) and Stephen Rothwell (drivers/net/usb/qmi_wwan.c). The net/mac80211/mlme.c conflict seemed easy enough, accounting for a conversion to some new tracing macros. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-10 23:56:33 -07:00
Eric W. Biederman	96ca7ffe74	bonding: debugfs and network namespaces are incompatible The bonding debugfs support has been broken in the presence of network namespaces since it has been added. The debugfs support does not handle multiple bonding devices with the same name in different network namespaces. I haven't had any bug reports, and I'm not interested in getting any. Disable the debugfs support when network namespaces are enabled. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-09 14:49:15 -07:00
Eric W. Biederman	a64d49c3dd	bonding: Manage /proc/net/bonding/ entries from the netdev events It was recently reported that moving a bonding device between network namespaces causes warnings from /proc. It turns out after the move we were trying to add and to remove the /proc/net/bonding entries from the wrong network namespace. Move the bonding /proc registration code into the NETDEV_REGISTER and NETDEV_UNREGISTER events where the proc registration and unregistration will always happen at the right time. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-09 14:49:15 -07:00
David S. Miller	e486463e82	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/usb/qmi_wwan.c net/batman-adv/translation-table.c net/ipv6/route.c qmi_wwan.c resolution provided by Bjørn Mork. batman-adv conflict is dealing merely with the changes of global function names to have a proper subsystem prefix. ipv6's route.c conflict is merely two side-by-side additions of network namespace methods. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-25 15:50:32 -07:00
Amerigo Wang	df2bcc4af2	bonding: show all the link status of slaves There are four link statuses of a bonding slave, the procfs code shows a wrong status when using downdelay/updelay: (slave->link == BOND_LINK_UP) ? "up" : "down" It doesn't respect the rest two statuses. This patch fixes it. Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-17 16:23:54 -07:00
Eric Dumazet	0450243096	bonding: drop_monitor aware When packets are dropped in TX path, its better to use kfree_skb() instead of dev_kfree_skb() to give proper drop_monitor events. Also move the kfree_skb() call after read_unlock() in bond_alb_xmit() and bond_xmit_activebackup() Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-13 16:00:26 -07:00
David S. Miller	43b03f1f6d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: MAINTAINERS drivers/net/wireless/iwlwifi/pcie/trans.c The iwlwifi conflict was resolved by keeping the code added in 'net' that turns off the buggy chip feature. The MAINTAINERS conflict was merely overlapping changes, one change updated all the wireless web site URLs and the other changed some GIT trees to be Johannes's instead of John's. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-12 21:59:18 -07:00
Eric Dumazet	de063b7040	bonding: remove packet cloning in recv_probe() Cloning all packets in input path have a significant cost. Use skb_header_pointer()/skb_copy_bits() instead of pskb_may_pull() so that recv_probe handlers (bond_3ad_lacpdu_recv / bond_arp_rcv / rlb_arp_recv ) dont touch input skb. bond_handle_frame() can avoid the skb_clone()/dev_kfree_skb() Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Cc: Maciej Żenczykowski <maze@google.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-12 18:51:09 -07:00
Eric Dumazet	5ee31c6898	bonding: Fix corrupted queue_mapping In the transmit path of the bonding driver, skb->cb is used to stash the skb->queue_mapping so that the bonding device can set its own queue mapping. This value becomes corrupted since the skb->cb is also used in __dev_xmit_skb. When transmitting through bonding driver, bond_select_queue is called from dev_queue_xmit. In bond_select_queue the original skb->queue_mapping is copied into skb->cb (via bond_queue_mapping) and skb->queue_mapping is overwritten with the bond driver queue. Subsequently in dev_queue_xmit, __dev_xmit_skb is called which writes the packet length into skb->cb, thereby overwriting the stashed queue mappping. In bond_dev_queue_xmit (called from hard_start_xmit), the queue mapping for the skb is set to the stashed value which is now the skb length and hence is an invalid queue for the slave device. If we want to save skb->queue_mapping into skb->cb[], best place is to add a field in struct qdisc_skb_cb, to make sure it wont conflict with other layers (eg : Qdiscc, Infiniband...) This patchs also makes sure (struct qdisc_skb_cb)->data is aligned on 8 bytes : netem qdisc for example assumes it can store an u64 in it, without misalignment penalty. Note : we only have 20 bytes left in (struct qdisc_skb_cb)->data[]. The largest user is CHOKe and it fills it. Based on a previous patch from Tom Herbert. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Tom Herbert <therbert@google.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Roland Dreier <roland@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-12 15:29:21 -07:00
Weiping Pan	8a93664df9	bonding:record primary when modify it via sysfs If we modify primary via sysfs and it is not a valid slave, we should record it for future use, and this behavior is the same with bond_check_params(). Signed-off-by: Weiping Pan <wpan@redhat.com> Acked-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-12 15:23:11 -07:00
David S. Miller	028940342a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2012-05-16 22:17:37 -04:00
David S. Miller	b99215cdc6	bonding: Fix LACPDU rx_dropped commit. I applied the wrong version of Jiri's bonding fix in commit `13a8e0c8cd` ("bonding: don't increase rx_dropped after processing LACPDUs") I applied v3, which introduces warnings I asked him to fix, instead of v4 which properly takes care of those issues. This inter-diffs such that the warnings are now gone. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-13 15:45:13 -04:00
Joe Perches	a6700db179	net, drivers/net: Convert compare_ether_addr_64bits to ether_addr_equal_64bits Use the new bool function ether_addr_equal_64bits to add some clarity and reduce the likelihood for misuse of compare_ether_addr_64bits for sorting. Done via cocci script: $ cat compare_ether_addr_64bits.cocci @@ expression a,b; @@ - !compare_ether_addr_64bits(a, b) + ether_addr_equal_64bits(a, b) @@ expression a,b; @@ - compare_ether_addr_64bits(a, b) + !ether_addr_equal_64bits(a, b) @@ expression a,b; @@ - !ether_addr_equal_64bits(a, b) == 0 + ether_addr_equal_64bits(a, b) @@ expression a,b; @@ - !ether_addr_equal_64bits(a, b) != 0 + !ether_addr_equal_64bits(a, b) @@ expression a,b; @@ - ether_addr_equal_64bits(a, b) == 0 + !ether_addr_equal_64bits(a, b) @@ expression a,b; @@ - ether_addr_equal_64bits(a, b) != 0 + ether_addr_equal_64bits(a, b) @@ expression a,b; @@ - !!ether_addr_equal_64bits(a, b) + ether_addr_equal_64bits(a, b) Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-10 23:33:01 -04:00
Joe Perches	2e42e4747e	drivers/net: Convert compare_ether_addr to ether_addr_equal Use the new bool function ether_addr_equal to add some clarity and reduce the likelihood for misuse of compare_ether_addr for sorting. Done via cocci script: $ cat compare_ether_addr.cocci @@ expression a,b; @@ - !compare_ether_addr(a, b) + ether_addr_equal(a, b) @@ expression a,b; @@ - compare_ether_addr(a, b) + !ether_addr_equal(a, b) @@ expression a,b; @@ - !ether_addr_equal(a, b) == 0 + ether_addr_equal(a, b) @@ expression a,b; @@ - !ether_addr_equal(a, b) != 0 + !ether_addr_equal(a, b) @@ expression a,b; @@ - ether_addr_equal(a, b) == 0 + !ether_addr_equal(a, b) @@ expression a,b; @@ - ether_addr_equal(a, b) != 0 + ether_addr_equal(a, b) @@ expression a,b; @@ - !!ether_addr_equal(a, b) + ether_addr_equal(a, b) Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-10 23:33:01 -04:00
Jiri Bohac	13a8e0c8cd	bonding: don't increase rx_dropped after processing LACPDUs Since commit `3aba891d`, bonding processes LACP frames (802.3ad mode) with bond_handle_frame(). Currently a copy of the skb is made and the original is left to be processed by other rx_handlers and the rest of the network stack by returning RX_HANDLER_ANOTHER. As there is no protocol handler for PKT_TYPE_LACPDU, the frame is dropped and dev->rx_dropped increased. Fix this by making bond_handle_frame() return RX_HANDLER_CONSUMED if bonding has processed the LACP frame. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-10 23:30:01 -04:00
Rick Jones	13b95fb714	bonding: bond_update_speed_duplex() can return void since no callers check its return As none of the callers of bond_update_speed_duplex (need to) check its return value, there is little point in it returning anything. Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-27 00:03:35 -04:00
Michal Kubeček	f31c7937c2	bonding: start slaves with link down for ARP monitor Initialize slave device link state as down if ARP monitor is active and net_carrier_ok() returns zero. Also shift initial value of its last_arp_tx so that it doesn't immediately cause fake detection of "up" state. When ARP monitoring is used, initializing the slave device with up link state can cause ARP monitor to detect link failure before the device is really up (with igb driver, this can take more than two seconds). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-19 15:25:02 -04:00
David S. Miller	64d683c582	bonding: Fixup get_tx_queue() op second arg type. I missed this when fixing up the warning in the previous commit. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-13 15:02:35 -04:00
stephen hemminger	efacb309b5	rtnetlink & bonding: change args got get_tx_queues Change get_tx_queues, drop unsused arg/return value real_tx_queues, and use return by value (with error) rather than call by reference. Probably bonding should just change to LLTX and the whole get_tx_queues API could disappear! Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-13 13:31:00 -04:00
Veaceslav Falico	5a4309746c	bonding: properly unset current_arp_slave on slave link up When a slave comes up, we're unsetting the current_arp_slave without removing active flags from it, which can lead to situations where we have more than one slave with active flags in active-backup mode. To avoid this situation we must remove the active flags from a slave before removing it as a current_arp_slave. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-05 19:07:59 -04:00
Shlomo Pongratz	234bcf8a49	net/bonding: correctly proxy slave neigh param setup ndo function The current implemenation was buggy for slaves who use ndo_neigh_setup, since the networking stack invokes the bonding device ndo entry (from neigh_params_alloc) before any devices are enslaved, and the bonding driver can't further delegate the call at that point in time. As a result when bonding IPoIB devices, the neigh_cleanup hasn't been called. Fix that by deferring the actual call into the slave ndo_neigh_setup from the time the bonding neigh_setup is called. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-04 18:11:12 -04:00
Shlomo Pongratz	2af73d4b2a	net/bonding: emit address change event also in bond_release commit `7d26bb103c` "bonding: emit event when bonding changes MAC" didn't take care to emit the NETDEV_CHANGEADDR event in bond_release, where bonding actually changes the mac address (to all zeroes). As a result the neighbours aren't deleted by the core networking code (which does so upon getting that event). Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-04 18:11:12 -04:00
Linus Torvalds	ed359a3b7b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Provide device string properly for USB i2400m wimax devices, also don't OOPS when providing firmware string. From Phil Sutter. 2) Add support for sh_eth SH7734 chips, from Nobuhiro Iwamatsu. 3) Add another device ID to USB zaurus driver, from Guan Xin. 4) Loop index start in pool vector iterator is wrong causing MAC to not get configured in bnx2x driver, fix from Dmitry Kravkov. 5) EQL driver assumes HZ=100, fix from Eric Dumazet. 6) Now that skb_add_rx_frag() can specify the truesize increment separately, do so in f_phonet and cdc_phonet, also from Eric Dumazet. 7) virtio_net accidently uses net_ratelimit() not only on the kernel warning but also the statistic bump, fix from Rick Jones. 8) ip_route_input_mc() uses fixed init_net namespace, oops, use dev_net(dev) instead. Fix from Benjamin LaHaise. 9) dev_forward_skb() needs to clear the incoming interface index of the SKB so that it looks like a new incoming packet, also from Benjamin LaHaise. 10) iwlwifi mistakenly initializes a channel entry as 2GHZ instead of 5GHZ, fix from Stanislav Yakovlev. 11) Missing kmalloc() return value checks in orinoco, from Santosh Nayak. 12) ath9k doesn't check for HT capabilities in the right way, it is checking ht_supported instead of the ATH9K_HW_CAP_HT flag. Fix from Sujith Manoharan. 13) Fix x86 BPF JIT emission of 16-bit immediate field of AND instructions, from Feiran Zhuang. 14) Avoid infinite loop in GARP code when registering sysfs entries. From David Ward. 15) rose protocol uses memcpy instead of memcmp in a device address comparison, oops. Fix from Daniel Borkmann. 16) Fix build of lpc_eth due to dev_hw_addr_rancom() interface being renamed to eth_hw_addr_random(). From Roland Stigge. 17) Make ipv6 RTM_GETROUTE interpret RTA_IIF attribute the same way that ipv4 does. Fix from Shmulik Ladkani. 18) via-rhine has an inverted bit test, causing suspend/resume regressions. Fix from Andreas Mohr. 19) RIONET assumes 4K page size, fix from Akinobu Mita. 20) Initialization of imask register in sky2 is buggy, because bits are "or'd" into an uninitialized local variable. Fix from Lino Sanfilippo. 21) Fix FCOE checksum offload handling, from Yi Zou. 22) Fix VLAN processing regression in e1000, from Jiri Pirko. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits) sky2: dont overwrite settings for PHY Quick link tg3: Fix 5717 serdes powerdown problem net: usb: cdc_eem: fix mtu net: sh_eth: fix endian check for architecture independent usb/rtl8150 : Remove duplicated definitions rionet: fix page allocation order of rionet_active via-rhine: fix wait-bit inversion. ipv6: Fix RTM_GETROUTE's interpretation of RTA_IIF to be consistent with ipv4 net: lpc_eth: Fix rename of dev_hw_addr_random net/netfilter/nfnetlink_acct.c: use linux/atomic.h rose_dev: fix memcpy-bug in rose_set_mac_address Fix non TBI PHY access; a bad merge undid bug fix in a previous commit. net/garp: avoid infinite loop if attribute already exists x86 bpf_jit: fix a bug in emitting the 16-bit immediate operand of AND bonding: emit event when bonding changes MAC mac80211: fix oper channel timestamp updation ath9k: Use HW HT capabilites properly MAINTAINERS: adding maintainer for ipw2x00 net: orinoco: add error handling for failed kmalloc(). net/wireless: ipw2x00: fix a typo in wiphy struct initilization ...	2012-04-02 17:53:39 -07:00
Weiping Pan	7d26bb103c	bonding: emit event when bonding changes MAC When a bonding device is configured with fail_over_mac=active, we expect to see the MAC address of the new active slave as the source MAC address after failover. But we see that the source MAC address is the MAC address of previous active slave. Emit NETDEV_CHANGEADDR event when bonding changes its MAC address, in order to let arp_netdev_event flush neighbour cache and route cache. How to reproduce this bug ? -----------hostB---------------- hostA ----- switch ---\|-- eth0--bond0(192.168.100.2/24)\| (192.168.100.1/24 \--\|-- eth1-/ \| -------------------------------- 1 on hostB, modprobe bonding mode=1 miimon=500 fail_over_mac=active downdelay=1000 num_grat_arp=1 ifconfig bond0 192.168.100.2/24 up ifenslave bond0 eth0 ifenslave bond0 eth1 then eth0 is the active slave, and MAC of bond0 is MAC of eth0. 2 on hostA, ping 192.168.100.2 3 on hostB, tcpdump -i bond0 -p icmp -XXX you will see bond0 uses MAC of eth0 as source MAC in icmp reply. 4 on hostB, ifconfig eth0 down tcpdump -i bond0 -p icmp -XXX (just keep it running in step 3) you will see first bond0 uses MAC of eth1 as source MAC in icmp reply, then it will use MAC of eth0 as source MAC. Signed-off-by: Weiping Pan <wpan@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-03-29 18:11:57 -04:00
Linus Torvalds	0195c00244	Disintegrate and delete asm/system.h -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIVAwUAT3NKzROxKuMESys7AQKElw/+JyDxJSlj+g+nymkx8IVVuU8CsEwNLgRk 8KEnRfLhGtkXFLSJYWO6jzGo16F8Uqli1PdMFte/wagSv0285/HZaKlkkBVHdJ/m u40oSjgT013bBh6MQ0Oaf8pFezFUiQB5zPOA9QGaLVGDLXCmgqUgd7exaD5wRIwB ZmyItjZeAVnDfk1R+ZiNYytHAi8A5wSB+eFDCIQYgyulA1Igd1UnRtx+dRKbvc/m rWQ6KWbZHIdvP1ksd8wHHkrlUD2pEeJ8glJLsZUhMm/5oMf/8RmOCvmo8rvE/qwl eDQ1h4cGYlfjobxXZMHqAN9m7Jg2bI946HZjdb7/7oCeO6VW3FwPZ/Ic75p+wp45 HXJTItufERYk6QxShiOKvA+QexnYwY0IT5oRP4DrhdVB/X9cl2MoaZHC+RbYLQy+ /5VNZKi38iK4F9AbFamS7kd0i5QszA/ZzEzKZ6VMuOp3W/fagpn4ZJT1LIA3m4A9 Q0cj24mqeyCfjysu0TMbPtaN+Yjeu1o1OFRvM8XffbZsp5bNzuTDEvviJ2NXw4vK 4qUHulhYSEWcu9YgAZXvEWDEM78FXCkg2v/CrZXH5tyc95kUkMPcgG+QZBB5wElR FaOKpiC/BuNIGEf02IZQ4nfDxE90QwnDeoYeV+FvNj9UEOopJ5z5bMPoTHxm4cCD NypQthI85pc= =G9mT -----END PGP SIGNATURE----- Merge tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system Pull "Disintegrate and delete asm/system.h" from David Howells: "Here are a bunch of patches to disintegrate asm/system.h into a set of separate bits to relieve the problem of circular inclusion dependencies. I've built all the working defconfigs from all the arches that I can and made sure that they don't break. The reason for these patches is that I recently encountered a circular dependency problem that came about when I produced some patches to optimise get_order() by rewriting it to use ilog2(). This uses bitops - and on the SH arch asm/bitops.h drags in asm-generic/get_order.h by a circuituous route involving asm/system.h. The main difficulty seems to be asm/system.h. It holds a number of low level bits with no/few dependencies that are commonly used (eg. memory barriers) and a number of bits with more dependencies that aren't used in many places (eg. switch_to()). These patches break asm/system.h up into the following core pieces: (1) asm/barrier.h Move memory barriers here. This already done for MIPS and Alpha. (2) asm/switch_to.h Move switch_to() and related stuff here. (3) asm/exec.h Move arch_align_stack() here. Other process execution related bits could perhaps go here from asm/processor.h. (4) asm/cmpxchg.h Move xchg() and cmpxchg() here as they're full word atomic ops and frequently used by atomic_xchg() and atomic_cmpxchg(). (5) asm/bug.h Move die() and related bits. (6) asm/auxvec.h Move AT_VECTOR_SIZE_ARCH here. Other arch headers are created as needed on a per-arch basis." Fixed up some conflicts from other header file cleanups and moving code around that has happened in the meantime, so David's testing is somewhat weakened by that. We'll find out anything that got broken and fix it.. * tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits) Delete all instances of asm/system.h Remove all #inclusions of asm/system.h Add #includes needed to permit the removal of asm/system.h Move all declarations of free_initmem() to linux/mm.h Disintegrate asm/system.h for OpenRISC Split arch_align_stack() out from asm-generic/system.h Split the switch_to() wrapper out of asm-generic/system.h Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h Create asm-generic/barrier.h Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h Disintegrate asm/system.h for Xtensa Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt] Disintegrate asm/system.h for Tile Disintegrate asm/system.h for Sparc Disintegrate asm/system.h for SH Disintegrate asm/system.h for Score Disintegrate asm/system.h for S390 Disintegrate asm/system.h for PowerPC Disintegrate asm/system.h for PA-RISC Disintegrate asm/system.h for MN10300 ...	2012-03-28 15:58:21 -07:00
David Howells	9ffc93f203	Remove all #inclusions of asm/system.h Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\sinclude\s<asm/system[.]h>.\n!!' `grep -Irl '^#\sinclude\s<asm/system[.]h>' ` Signed-off-by: David Howells <dhowells@redhat.com>	2012-03-28 18:30:03 +01:00
Andy Gospodarek	eaddcd7690	bonding: remove entries for master_ip and vlan_ip and query devices instead The following patch aimed to resolve an issue where secondary, tertiary, etc. addresses added to bond interfaces could overwrite the bond->master_ip and vlan_ip values. commit `917fbdb32f` Author: Henrik Saavedra Persson <henrik.e.persson@ericsson.com> Date: Wed Nov 23 23:37:15 2011 +0000 bonding: only use primary address for ARP That patch was good because it prevented bonds using ARP monitoring from sending frames with an invalid source IP address. Unfortunately, it didn't always work as expected. When using an ioctl (like ifconfig does) to set the IP address and netmask, 2 separate ioctls are actually called to set the IP and netmask if the mask chosen doesn't match the standard mask for that class of address. The first ioctl did not have a mask that matched the one in the primary address and would still cause the device address to be overwritten. The second ioctl that was called to set the mask would then detect as secondary and ignored, but the damage was already done. This was not an issue when using an application that used netlink sockets as the setting of IP and netmask came down at once. The inconsistent behavior between those two interfaces was something that needed to be resolved. While I was thinking about how I wanted to resolve this, Ralf Zeidler came with a patch that resolved this on a RHEL kernel by keeping a full shadow of the entries in dev->ifa_list for the bonding device and vlan devices in the bonding driver. I didn't like the duplication of the list as I want to see the 'bonding' struct and code shrink rather than grow, but liked the general idea. As the Subject indicates this patch drops the master_ip and vlan_ip elements from the 'bonding' and 'vlan_entry' structs, respectively. This can be done because a device's address-list is now traversed to determine the optimal source IP address for ARP requests and for checks to see if the bonding device has a particular IP address. This code could have all be contained inside the bonding driver, but it made more sense to me to EXPORT and call inet_confirm_addr since it did exactly what was needed. I tested this and a backported patch and everything works as expected. Ralf also helped with verification of the backported patch. Thanks to Ralf for all his help on this. v2: Whitespace and organizational changes based on suggestions from Jay Vosburgh and Dave Miller. v3: Fixup incorrect usage of rcu_read_unlock based on Dave Miller's suggestion. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> CC: Ralf Zeidler <ralf.zeidler@nsn.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-03-22 22:36:17 -04:00
Peter Pan(潘卫平)	1c3ac4289a	bonding: send igmp report for its master Liang Zheng(lzheng@redhat.com) found that in the following topo, bonding does not send igmp report when we trigger a fail-over of bonding. eth0-- \|-- bond0 -- br0 eth1-- modprobe bonding mode=1 miimon=100 resend_igmp=10 ifconfig bond0 up ifenslave bond0 eth0 eth1 brctl addbr br0 ifconfig br0 192.168.100.2/24 up brctl addif br0 bond0 Add 192.168.100.2(br0) into a multicast group, like 224.10.10.10, then trigger a fali-over in bonding. You can see that parameter "resend_igmp" does not work. The reason is that when we add br0 into a multicast group, it does not propagate multicast knowledge down to its ports. If we choose to propagate multicast knowledge down to all ports for bridge, then we have to track every change that is done to bridge, and keep a backup for all ports. It is hard to track, I think. Instead I choose to modify bonding to send igmp report for its master. Changelog: V2: correct comments V3: move this check into bond_resend_igmp_join_requests() V4: only send igmp reports if bond is enslaved to a bridge Signed-off-by: Weiping Pan <panweiping3@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-03-19 18:02:05 -04:00
Jesper Juhl	b9d6d2dbf4	bonding: Fix misspelling of "since" Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-02-05 22:42:00 -05:00
Joe Perches	e404decb0f	drivers/net: Remove unnecessary k.alloc/v.alloc OOM messages alloc failures use dump_stack so emitting an additional out-of-memory message is an unnecessary duplication. Remove the allocation failure messages. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-31 16:20:21 -05:00
Jiri Bohac	b924551bed	bonding: fix enslaving in alb mode when link down bond_alb_init_slave() is called from bond_enslave() and sets the slave's MAC address. This is done differently for TLB and ALB modes. bond->alb_info.rlb_enabled is used to discriminate between the two modes but this flag may be uninitialized if the slave is being enslaved prior to calling bond_open() -> bond_alb_initialize() on the master. It turns out all the callers of alb_set_slave_mac_addr() pass bond->alb_info.rlb_enabled as the hw parameter. This patch cleans up the unnecessary parameter of alb_set_slave_mac_addr() and makes the function decide based on the bonding mode instead, which fixes the above problem. Reported-by: Narendra K <Narendra_K@Dell.com> Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-18 20:59:53 -05:00
Maxim Uvarov	f515e6b770	bond_alb: don't disable softirq under bond_alb_xmit No need to lock soft irqs under bond_alb_xmit() which already has softirq disabled. Changes: 1. add non-bh/bh version to tlb_clear_slave() 2. represent BH and non BH hash table locks _lock_rx_hashtbl_bh/_unlock_rx_hashtbl_bh _lock_rx_hashtbl/_unlock_rx_hashtbl _lock_tx_hashtbl_bh/_unlock_tx_hashtbl_bh _lock_tx_hashtbl/_unlock_tx_hashtbl Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-11 12:52:26 -08:00
Linus Torvalds	7affca3537	Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core * 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits) arm: fix up some samsung merge sysdev conversion problems firmware: Fix an oops on reading fw_priv->fw in sysfs loading file Drivers:hv: Fix a bug in vmbus_driver_unregister() driver core: remove __must_check from device_create_file debugfs: add missing #ifdef HAS_IOMEM arm: time.h: remove device.h #include driver-core: remove sysdev.h usage. clockevents: remove sysdev.h arm: convert sysdev_class to a regular subsystem arm: leds: convert sysdev_class to a regular subsystem kobject: remove kset_find_obj_hinted() m86k: gpio - convert sysdev_class to a regular subsystem mips: txx9_sram - convert sysdev_class to a regular subsystem mips: 7segled - convert sysdev_class to a regular subsystem sh: dma - convert sysdev_class to a regular subsystem sh: intc - convert sysdev_class to a regular subsystem power: suspend - convert sysdev_class to a regular subsystem power: qe_ic - convert sysdev_class to a regular subsystem power: cmm - convert sysdev_class to a regular subsystem s390: time - convert sysdev_class to a regular subsystem ... Fix up conflicts with 'struct sysdev' removal from various platform drivers that got changed: - arch/arm/mach-exynos/cpu.c - arch/arm/mach-exynos/irq-eint.c - arch/arm/mach-s3c64xx/common.c - arch/arm/mach-s3c64xx/cpu.c - arch/arm/mach-s5p64x0/cpu.c - arch/arm/mach-s5pv210/common.c - arch/arm/plat-samsung/include/plat/cpu.h - arch/powerpc/kernel/sysfs.c and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h	2012-01-07 12:03:30 -08:00
Greg Kroah-Hartman	ff4b8a57f0	Merge branch 'driver-core-next' into Linux 3.2 This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file, and it fixes the build error in the arch/x86/kernel/microcode_core.c file, that the merge did not catch. The microcode_core.c patch was provided by Stephen Rothwell <sfr@canb.auug.org.au> who was invaluable in the merge issues involved with the large sysdev removal process in the driver-core tree. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2012-01-06 11:42:52 -08:00
stephen hemminger	f7d9821a6a	bonding: fix error handling if slave is busy (v2) If slave device already has a receive handler registered, then the error unwind of bonding device enslave function is broken. The following will leave a pointer to freed memory in the slave device list, causing a later kernel panic. # modprobe dummy # ip li add dummy0-1 link dummy0 type macvlan # modprobe bonding # echo +dummy0 >/sys/class/net/bond0/bonding/slaves The fix is to detach the slave (which removes it from the list) in the unwind path. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-03 12:49:16 -05:00
Kay Sievers	edbaa603eb	driver-core: remove sysdev.h usage. The sysdev.h file should not be needed by any in-kernel code, so remove the .h file from these random files that seem to still want to include it. The sysdev code will be going away soon, so this include needs to be removed no matter what. Cc: Jiandong Zheng <jdzheng@broadcom.com> Cc: Scott Branden <sbranden@broadcom.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Kukjin Kim <kgene.kim@samsung.com> Cc: David Brown <davidb@codeaurora.org> Cc: Daniel Walker <dwalker@fifo99.com> Cc: Bryan Huntsman <bryanh@codeaurora.org> Cc: Ben Dooks <ben-linux@fluff.org> Cc: Wan ZongShun <mcuos.com@gmail.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: "Venkatesh Pallipadi Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>	2011-12-21 16:26:03 -08:00
Jiri Pirko	87002b03ba	net: introduce vlan_vid_[add/del] and use them instead of direct [add/kill]_vid ndo calls This patch adds wrapper for ndo_vlan_rx_add_vid/ndo_vlan_rx_kill_vid functions. Check for NETIF_F_HW_VLAN_FILTER feature is done in this wrapper. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-08 19:52:42 -05:00
Jiri Pirko	8e586137e6	net: make vlan ndo_vlan_rx_[add/kill]_vid return error value Let caller know the result of adding/removing vlan id to/from vlan filter. In some drivers I make those functions to just return 0. But in those where there is able to see if hw setup went correctly, return value is set appropriately. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-08 19:52:37 -05:00
David S. Miller	b3613118eb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2011-12-02 13:49:21 -05:00

1 2 3 4 5 ...

555 Commits