linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Alexander Duyck	754baf8dec	fib_trie: replace tnode_get_child functions with get_child macros I am replacing the tnode_get_child call with get_child since we are techically pulling the child out of a key_vector now and not a tnode. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:49:27 -05:00
Alexander Duyck	35c6edac19	fib_trie: Rename tnode to key_vector Rename the tnode to key_vector. The key_vector will be the eventual container for all of the information needed by either a leaf or a tnode. The final result should be much smaller than the 40 bytes currently needed for either one. This also updates the trie struct so that it contains an array of size 1 of tnode pointers. This is to bring the structure more inline with how an actual tnode itself is configured. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:49:27 -05:00
Alexander Duyck	8d8e810ca8	fib_trie: Return pointer to tnode pointer in resize/inflate/halve Resize related functions now all return a pointer to the pointer that references the object that was resized. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:49:27 -05:00
Alexander Duyck	72be72607a	fib_trie: Minor cleanups to fib_table_flush_external This change just does a couple of minor cleanups on fib_table_flush_external. Specifically it addresses the fact that resize was being called even though nothing was being removed from the table, and it drops an unecessary indent since we could just call continue on the inverse of the fi && flag check. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:49:27 -05:00
Punnaiah Choudary Kalluri	20488239d2	net: macb: Fix multi queue support for xilinx ZynqMP ZynqMP soc has single interrupt for all the queue events. So, passing the IRQF_SHARED flag for interrupt registration call. Signed-off-by: Punnaiah Choudary Kalluri <punnaia@xilinx.com> Signed-off-by: Michal Simek <michal.simek@xilinx.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:47:48 -05:00
Punnaiah Choudary Kalluri	8a013a9c71	net: macb: Include multi queue support for xilinx ZynqMP ethernet version Include multi queue support for the ethernet IP version in xilinx ZynqMP SoC. Signed-off-by: Punnaiah Choudary Kalluri <punnaia@xilinx.com> Signed-off-by: Michal Simek <michal.simek@xilinx.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:47:47 -05:00
David S. Miller	28c0f02ffe	Major changes: brcmfmac: * sdio improvements * add a debugfs file so users can provide us all the revinfo we could ask for iwlwifi: * add triggers for firmware dump collection * remove support for -9.ucode * new statitics API * rate control improvements ath9k: * add per-vif TX power capability * BT coexistance fixes ath10k: * qca6174: enable STA transmit beamforming (TxBF) support * disable multi-vif power save by default bcma: * enable support for PCIe Gen 2 host devices -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAABAgAGBQJU+dliAAoJEG4XJFUm622bsqQH/RO1Gxuw6hmiHPeeIcoDmlvt MZKvy6xcAiFqREfGwDxjVminlTZ7/MB9bABeaoQKzpQFpCJW/ftjIqwfbRqZWsvG 3IC0s2nPTwWU8YSsZTbifnyXCVNQDJuE+5nQ3hMO2rE/dZDi1zt1fS2hiSXtlASS kgBJcfXgoVxvhZ1WI+uVpbU0RtwXmI7tVylREE1sbgCrg7AuJx4Q2QmZ1GioPRLy 20HnFVFcIcbHk4eXVwAJOspdjctujoR858pg/oxlcVXWb7MOOCV/Fk8WMursZxFh qj/I/kbDcFYh3H5uC+6qL/kRByY80/yckLDiMbghA0QR5/PSx2nvp/UfkqIf008= =qgVl -----END PGP SIGNATURE----- Merge tag 'wireless-drivers-next-for-davem-2015-03-06' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Major changes: brcmfmac: * sdio improvements * add a debugfs file so users can provide us all the revinfo we could ask for iwlwifi: * add triggers for firmware dump collection * remove support for -9.ucode * new statitics API * rate control improvements ath9k: * add per-vif TX power capability * BT coexistance fixes ath10k: * qca6174: enable STA transmit beamforming (TxBF) support * disable multi-vif power save by default bcma: * enable support for PCIe Gen 2 host devices Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:46:08 -05:00
Willem de Bruijn	89650ad004	fib: make netdev_switch_fib_ipv4_abort in header file static inline When building without CONFIG_NET_SWITCHDEV, netdev_switch_fib_ipv4_abort is defined in the header file. It must be static inline to avoid build failure at link time. Fixes: `8e05fd7166` ("fib: hook IPv4 fib for hardware offload") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:44:35 -05:00
Robert Shearman	f8d54afc4c	mpls: Properly validate RTA_VIA payload length If the nla length is less than 2 then the nla data could be accessed beyond the accessible bounds. So ensure that the nla is big enough to at least read the via_family before doing so. Replace magic value of 2. Fixes: `03c0566542` ("mpls: Basic support for adding and removing routes") Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Robert Shearman <rshearma@brocade.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:19:06 -05:00
David S. Miller	5c4b934f89	Merge branch 'bcmgenet-next' Petri Gynther says: ==================== net: bcmgenet: preparation for multiple Rx queues Three small patches in preparation for supporting multiple Rx queues: 1. set hw_params->rx_queues = 0 2. adjust the call to alloc_etherdev_mqs() 3. add GENET_Q16_RX_BD_CNT and hw_params->rx_bds_per_q ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:13:47 -05:00
Petri Gynther	3feafa0215	net: bcmgenet: add GENET_Q16_RX_BD_CNT and hw_params->rx_bds_per_q In preparation for supporting multiple Rx queues, add GENET_Q16_RX_BD_CNT and hw_params->rx_bds_per_q. Signed-off-by: Petri Gynther <pgynther@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:13:41 -05:00
Petri Gynther	3feafeed16	net: bcmgenet: adjust the call to alloc_etherdev_mqs() In preparation for supporting multiple Rx queues, adjust the call to alloc_etherdev_mqs() to allow max GENET_MAX_MQ_CNT + 1 Rx queues. The actual number of Rx queues in use is correctly adjusted with: netif_set_real_num_rx_queues(priv->dev, priv->hw_params->rx_queues + 1); Signed-off-by: Petri Gynther <pgynther@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:13:41 -05:00
Petri Gynther	7e906e025d	net: bcmgenet: set hw_params->rx_queues = 0 bcmgenet driver doesn't yet support multiple Rx queues. Set hw_params->rx_queues = 0 accordingly. The default Rx queue (Q16) is still created and operational. Signed-off-by: Petri Gynther <pgynther@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:13:41 -05:00
David S. Miller	76f53bfdaa	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2015-03-06 This series contains updates to e1000, e1000e and igb. Yanir provides updates to e1000e based on the patches provided by John Linville. First updates the code comment to better describe the changes and the impact on the driver. Second removed calls to ioremap/unmap for i219 since this is only relevant to older hardware only. Starting with i219, the NVM will not be mapped to its one BAR but to a address region in another bar. Alex Duyck provides two fixes for igb, first fixes a compile warning where a variable may be used uninitialized, so Alex initializes it. Second fixes an issue where all of the pin register values were having to be pushed onto the stack each time the function was called, so to avoid this, Alex made them static const so that they should only need to be allocated once and we can avoid all the instructions to get them onto the stack. Eliezer found an issue in e1000 where we needed to be calling netif_carrier_off earlier in the down() to prevent the stack from queuing more packets to the interface. Sabrina Dubroca resolved a potential race condition by adding a dummy allocator. There was a race condition between e1000_change_mtu() cleanups and netpoll, when changing the MTU across jumbo sizes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 15:01:07 -05:00
David S. Miller	f0fdc80bd9	Merge branch 'pmtu-probe' Fan Du says: ==================== Improvements for TCP PMTU This patchset performs some improvements and enhancement for current TCP PMTU as per RFC4821 with the aim to find optimal mms size quickly, and also be adaptive to route changes like enlarged path MTU. Then TCP PMTU could be used to probe a effective pmtu in absence of ICMP message for tunnels(e.g. vxlan) across different networking stack. Patch1/4: Set probe mss base to 1024 Bytes per RFC4821 Patch2/4: Do not double probe_size for each probing, use a simple binary search to gain maximum performance. mss for next probing. Patch3/4: Create a probe timer to detect enlarged path MTU. Patch4/4: Update ip-sysctl.txt for new sysctl knobs. Changelog: v5: - Zero probe_size before resetting search range. - Update ip-sysctl.txt for new sysctl knobs. v4: - Convert probe_size to mss, not directly from search_low/high - Clamp probe_threshold - Don't adjust search_high in blackhole probe, so drop orignal patch3 v3: - Update commit message for patch2 - Fix pseudo timer delta calculation in patch4 v2: - Introduce sysctl_tcp_probe_threshold to control when probing will stop, as suggested by John Heffner. - Add patch3 to shrink current mss value for search low boundary. - Drop cannonical timer usages, implements pseudo timer based on 32bits jiffies tcp_time_stamp, as suggested by Eric Dumazet. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 14:57:46 -05:00
Fan Du	fab4276084	ipv4: Documenting two sysctls for tcp PMTU probe Namely tcp_probe_interval to control how often to restart a probe. And tcp_probe_threshold to control when stop the probing in respect to the width of search range in bytes Signed-off-by: Fan Du <fan.du@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 14:57:42 -05:00
Fan Du	05cbc0db03	ipv4: Create probe timer for tcp PMTU as per RFC4821 As per RFC4821 7.3. Selecting Probe Size, a probe timer should be armed once probing has converged. Once this timer expired, probing again to take advantage of any path PMTU change. The recommended probing interval is 10 minutes per RFC1981. Probing interval could be sysctled by sysctl_tcp_probe_interval. Eric Dumazet suggested to implement pseudo timer based on 32bits jiffies tcp_time_stamp instead of using classic timer for such rare event. Signed-off-by: Fan Du <fan.du@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 14:57:42 -05:00
Fan Du	6b58e0a5f3	ipv4: Use binary search to choose tcp PMTU probe_size Current probe_size is chosen by doubling mss_cache, the probing process will end shortly with a sub-optimal mss size, and the link mtu will not be taken full advantage of, in return, this will make user to tweak tcp_base_mss with care. Use binary search to choose probe_size in a fine granularity manner, an optimal mss will be found to boost performance as its maxmium. In addition, introduce a sysctl_tcp_probe_threshold to control when probing will stop in respect to the width of search range. Test env: Docker instance with vxlan encapuslation(82599EB) iperf -c 10.0.0.24 -t 60 before this patch: 1.26 Gbits/sec After this patch: increase 26% 1.59 Gbits/sec Signed-off-by: Fan Du <fan.du@intel.com> Acked-by: John Heffner <johnwheffner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 14:57:41 -05:00
Fan Du	dcd8fb8533	ipv4: Raise tcp PMTU probe mss base size Quotes from RFC4821 7.2. Selecting Initial Values It is RECOMMENDED that search_low be initially set to an MTU size that is likely to work over a very wide range of environments. Given today's technologies, a value of 1024 bytes is probably safe enough. The initial value for search_low SHOULD be configurable. Moreover, set a small value will introduce extra time for the search to converge. So set the initial probe base mss size to 1024 Bytes. Signed-off-by: Fan Du <fan.du@intel.com> Acked-by: John Heffner <johnwheffner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 14:57:41 -05:00
Eric W. Biederman	aaa4e70404	DECnet: Only use neigh_ops for adding the link layer header Other users users of the neighbour table use neigh->output as the method to decided when and which link-layer header to place on a packet. DECnet has been using neigh->output to decide which DECnet headers to place on a packet depending which neighbour the packet is destined for. The DECnet usage isn't totally wrong but it can run into problems if the neighbour output function is run for a second time as the teql driver and the bridge netfilter code can do. Therefore to avoid pathologic problems later down the line and make the neighbour code easier to understand by refactoring the decnet output code to only use a neighbour method to add a link layer header to a packet. This is done by moving the neigbhour operations lookup from dn_to_neigh_output to dn_neigh_output_packet. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 14:54:22 -05:00
Mahesh Bandewar	616f45416c	bonding: implement bond_poll_controller() This patches implements the poll_controller support for all bonding driver. If the slaves have poll_controller net_op defined, this implementation calls them. This is mode agnostic implementation and iterates through all slaves (based on mode) and calls respective handler. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 14:40:42 -05:00
Scott Feldman	8ea696384a	rocker: fix some sparse warnings Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 12:43:54 -05:00
Scott Feldman	e1315db17d	switchdev: fix CONFIG_IP_MULTIPLE_TABLES compile issue Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 12:43:54 -05:00
Sabrina Dubroca	08e8331654	e1000: add dummy allocator to fix race condition between mtu change and netpoll There is a race condition between e1000_change_mtu's cleanups and netpoll, when we change the MTU across jumbo size: Changing MTU frees all the rx buffers: e1000_change_mtu -> e1000_down -> e1000_clean_all_rx_rings -> e1000_clean_rx_ring Then, close to the end of e1000_change_mtu: pr_info -> ... -> netpoll_poll_dev -> e1000_clean -> e1000_clean_rx_irq -> e1000_alloc_rx_buffers -> e1000_alloc_frag And when we come back to do the rest of the MTU change: e1000_up -> e1000_configure -> e1000_configure_rx -> e1000_alloc_jumbo_rx_buffers alloc_jumbo finds the buffers already != NULL, since data (shared with page in e1000_rx_buffer->rxbuf) has been re-alloc'd, but it's garbage, or at least not what is expected when in jumbo state. This results in an unusable adapter (packets don't get through), and a NULL pointer dereference on the next call to e1000_clean_rx_ring (other mtu change, link down, shutdown): BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81194d6e>] put_compound_page+0x7e/0x330 [...] Call Trace: [<ffffffff81195445>] put_page+0x55/0x60 [<ffffffff815d9f44>] e1000_clean_rx_ring+0x134/0x200 [<ffffffff815da055>] e1000_clean_all_rx_rings+0x45/0x60 [<ffffffff815df5e0>] e1000_down+0x1c0/0x1d0 [<ffffffff811e2260>] ? deactivate_slab+0x7f0/0x840 [<ffffffff815e21bc>] e1000_change_mtu+0xdc/0x170 [<ffffffff81647050>] dev_set_mtu+0xa0/0x140 [<ffffffff81664218>] do_setlink+0x218/0xac0 [<ffffffff814459e9>] ? nla_parse+0xb9/0x120 [<ffffffff816652d0>] rtnl_newlink+0x6d0/0x890 [<ffffffff8104f000>] ? kvm_clock_read+0x20/0x40 [<ffffffff810a2068>] ? sched_clock_cpu+0xa8/0x100 [<ffffffff81663802>] rtnetlink_rcv_msg+0x92/0x260 By setting the allocator to a dummy version, netpoll can't mess up our rx buffers. The allocator is set back to a sane value in e1000_configure_rx. Fixes: `edbbb3ca10` ("e1000: implement jumbo receive with partial descriptors") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-03-06 02:47:10 -08:00
Eliezer Tamir	f9c029db70	e1000: call netif_carrier_off early on down When bringing down an interface netif_carrier_off() should be one the first things we do, since this will prevent the stack from queuing more packets to this interface. This operation is very fast, and should make the device behave much nicer when trying to bring down an interface under load. Also, this would Do The Right Thing (TM) if this device has some sort of fail-over teaming and redirect traffic to the other IF. Move netif_carrier_off as early as possible. Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-03-06 02:47:10 -08:00
Alexander Duyck	b23c0cc5e8	igb: Make arrays on stack static const to avoid reallocation While addressing the pin problem I noticed that all of the pin register values where having to be pushed onto the stack each time the function was called. To avoid that I am making them static const so that they should only need to be allocated once and we can avoid all the instructions to get them onto the stack.. size before: text data bss dec hex filename 161477 10512 8 171997 29fdd drivers/net/ethernet/intel/igb/igb.ko size after: text data bss dec hex filename 161205 10512 8 171725 29ecd drivers/net/ethernet/intel/igb/igb.ko Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-03-06 02:47:09 -08:00
Alexander Duyck	e357f0aae4	igb: Fix warning pin may be used uninitialized When building the kernel using the gcc 4.8.3 compiler included in Fedora 20 I was repeatedly seeing the warning: drivers/net/ethernet/intel/igb/igb_ptp.c: In function ‘igb_ptp_feature_enable_i210’: drivers/net/ethernet/intel/igb/igb_ptp.c:395:21: warning: ‘pin’ may be used uninitialized in this function [-Wmaybe-uninitialized] tssdp &= ~ts_sdp_en[pin]; ^ drivers/net/ethernet/intel/igb/igb_ptp.c:471:6: note: ‘pin’ was declared here int pin; ^ To resolve it I am assigning the pin a value of -1 when it is instantiated. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-03-06 02:47:09 -08:00
Yanir Lubetkin	1103a631a8	e1000e: remove calls to ioremap/unmap for NVM addr Starting I219, the NVM will not be mapped to its own BAR, but to an address region in another bar. The mapping/unmapping is relevant to older HW only. CC: John W Linville <linville@tuxdriver.com> Reported-by: John W Linville <linville@tuxdriver.com> Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-03-06 02:47:08 -08:00
Yanir Lubetkin	9d17ce493a	e1000e: fix obscure comments The interface to the device flash was modified in i219 and later HW. This patch better describes the change and the impact on the driver. CC: John W Linville <linville@tuxdriver.com> Reported-by: John W Linville <linville@tuxdriver.com> Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-03-06 02:47:07 -08:00
David S. Miller	23375a0fd5	ipv4: Fix unused variable warnings in fib_table_flush_external. net/ipv4/fib_trie.c: In function ‘fib_table_flush_external’: net/ipv4/fib_trie.c:1572:6: warning: unused variable ‘found’ [-Wunused-variable] int found = 0; ^ net/ipv4/fib_trie.c:1571:16: warning: unused variable ‘slen’ [-Wunused-variable] unsigned char slen; ^ Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:38:35 -05:00
David S. Miller	fabe7bed11	Merge branch 'l3_hw_offload' Scott Feldman says: ==================== switchdev: add IPv4 routing offload v4: - Add NETIF_F_NETNS_LOCAL to rocker port feature list to keep rocker ports in the default netns. Rocker hardware can't be partitioned to support multiple namespaces, currently. It would be interesting to add netns support to rocker device by basically adding another match field to each table to match on some unique netns ID, with a port knowing it's netns ID. Future work TDB. - Up-level the RTNH_F_EXTERNAL marking of routes installed to offload device from driver to switchdev common code. Now driver can't skip routes. Either it can install the route or it cannot. Yes or No. If no on any route, all offloading is aborted by removing routes from offload device and setting ipv4.fib_offload_disabled so no more routes can be offloaded. This is harsh, but it's our starting point. We can refine the policies in follow-up work. - Add new net.ipv4.fib_offload_disabled bool that is set if anything goes wrong with route offloading. We can refine this later to make the setting per-device or per-device-port-netdev, but let's start here simple and refine in follow-up work. - Rebase against Alex's latest FIB changes. I think I did everything correctly, and didn't run into any issues with testing, but I'd like Alex to look over the changes and maybe follow-up with any cleanups. v3: Changes based on v2 review comments: - Move check for custom rules up earlier in patch set, to keep git bisect safe. - Simplify the route add/modify failure handling to simple try until failure, and then on failure, undo everything. The switchdev driver will return err when route can normally be installed to device, but the install fails for one reason or another (no space left on device, etc). If a failure happens, uninstall all routes from the device, punting forwarding for all routes back to the kernel. - Scan route's full nexthop list, ensuring all nexthop devs belong to the same switchdev device, otherwise don't try to install route to device. v2: Changes based on v1 review comments and discussions at netconf: - Allow route modification, but use same ndo op used for adding route. Driver/device is expected to modify route in-place, if it can, to avoid interruption of service. - Add new RTNH_F_EXTERNAL flag to mark FIB entries offloaded externally. - Don't offload routes if using custom IP rules. If routes are already offloaded, and custom IP rules are turned on, flush routes from offload device. (Offloaded routes are marked with RTNH_F_EXTERNAL). - Use kernel's neigh resolution code to resolve route's nexthops' neigh MAC addrs. (Thanks davem, works great!). - Use fib->fib_priority in rocker driver to give priorities to routes in OF-DPA unicast route table. v1: This patch set adds L3 routing offload support for IPv4 routes. The idea is to mirror routes installed in the kernel's FIB down to a hardware switch device to offload the data forwarding path for L3. Only the data forwarding path is intercepted. Control and management of the kernel's FIB remains with the kernel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:26:16 -05:00
Scott Feldman	c1beeef7a3	rocker: implement IPv4 fib offloading The driver implements ndo_switch_fib_ipv4_add/del ops to add/del/mod IPv4 routes to/from switchdev device. Once a route is added to the device, and the route's nexthops are resolved to neighbor MAC address, the device will forward matching pkts rather than the kernel. This offloads the L3 forwarding path from the kernel to the device. Note that control and management planes are still mananged by Linux; only the data plane is offloaded. Standard routing control protocols such as OSPF and BGP run on Linux and manage the kernel's FIB via standard rtm netlink msgs...nothing changes here. A new hash table is added to rocker to track neighbors. The driver listens for neighbor updates events using netevent notifier NETEVENT_NEIGH_UPDATE. Any ARP table updates for ports on this device are recorded in this table. Routes installed to the device with nexthops that reference neighbors in this table are "qualified". In the case of a route with nexthops not resolved in the table, the kernel is asked to resolve the nexthop. The driver uses fib_info->fib_priority for the priority field in rocker's unicast routing table. The device can only forward to pkts matching route dst to resolved nexthops. Currently, the device only supports single-path routes (i.e. routes with one nexthop). Equal Cost Multipath (ECMP) route support will be added in followup patches. This patch is driver support for unicast IPv4 routing only. Followup patches will add driver and infrastructure for IPv6 routing and multicast routing. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:24:58 -05:00
Scott Feldman	8e05fd7166	fib: hook IPv4 fib for hardware offload Call into the switchdev driver any time an IPv4 fib entry is added/modified/deleted from the kernel's FIB. The switchdev driver may or may not install the route to the offload device. In the case where the driver tries to install the route and something goes wrong (device's routing table is full, etc), then all of the offloaded routes will be flushed from the device, route forwarding falls back to the kernel, and no more routes are offloading. We can refine this logic later. For now, use the simplist model of offloading routes up to the point of failure, and then on failure, undo everything and mark IPv4 offloading disabled. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:24:58 -05:00
Scott Feldman	448b128a14	ipv4: add net bool fib_offload_disabled If something goes wrong with IPv4 FIB offload, mark entire net offload disabled. This is brute force policy to basically shut down IPv4 FIB offload permanently if there is a problem offloading any route to an external device. We can refine the policy in the future, to handle failures on a per-device or per-route basis, but for now, this policy is per-net. What we're trying to avoid is an inconsistent split between the kernel's FIB and the offload device's FIB. We don't want the device to fwd a pkt inconsitent with what the kernel would do. An example of a split is if device has 10.0.0.0/16 and kernel has 10.0.0.0/16 and 10.0.0.0/24, the device wouldn't see the longest prefix 10.0.0.0/24 and potentially forward pkts incorrectly. Limited capacity or limited capability are two ways a route may fail to install to the offload device. We'll not differentiate between failures at this time, and treat any failure as fatal and mark the net as fib_offload_disabled. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:24:58 -05:00
Scott Feldman	b5d6fbdeed	switchdev: implement IPv4 fib ndo wrappers Flesh out ndo wrappers to call into device driver. To call into device driver, the wrapper must interate over route's nexthops to ensure all nexthop devs belong to the same switch device. Currently, there is no support for route's nexthops spanning offloaded and non-offloaded devices, or spanning ports of multiple offload devices. Since switch device ports may be stacked under virtual interfaces (bonds and/or bridges), and the route's nexthop may be on the virtual interface, the wrapper will traverse the nexthop dev down to the base dev. It's the base dev that's passed to the switchdev driver's ndo ops. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:24:58 -05:00
Scott Feldman	104616e74e	switchdev: don't support custom ip rules, for now Keep switchdev FIB offload model simple for now and don't allow custom ip rules. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:24:58 -05:00
Scott Feldman	5e8d90497d	switchdev: add IPv4 fib ndo ops wrappers Add IPv4 fib ndo wrapper funcs and stub them out for now. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:24:58 -05:00
Scott Feldman	4586f1bb91	netdevice: add IPv4 fib add/del ops Add two new ndo ops for IPv4 fib offload support, add and del. Add uses modifiy semantics if fib entry already offloaded. Drivers implementing the new ndo ops will return err<0 if programming device fails, for example if device's tables are full. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:24:58 -05:00
Scott Feldman	37ed949369	rtnetlink: add RTNH_F_EXTERNAL flag for fib offload Add new RTNH_F_EXTERNAL flag to mark fib entries offloaded externally, for example to a switchdev switch device. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:24:57 -05:00
Eric Dumazet	24d2e4a507	tg3: use napi_complete_done() Using napi_complete_done() instead of napi_complete() allows us to use /sys/class/net/ethX/gro_flush_timeout GRO layer can aggregate more packets if the flush is delayed a bit, without having to set too big coalescing parameters that impact latencies. Tested: lpx:~# echo 0 >/sys/class/net/eth1/gro_flush_timeout lpx:~# sar -n DEV 1 10 \| grep eth1 10:36:25 AM eth1 81290.00 40617.00 120479.67 2777.01 0.00 0.00 0.00 10:36:26 AM eth1 81283.00 40608.00 120481.81 2778.13 0.00 0.00 1.00 10:36:27 AM eth1 81304.00 40639.00 120518.42 2778.28 0.00 0.00 0.00 10:36:28 AM eth1 81255.00 40605.00 120437.34 2775.95 0.00 0.00 1.00 10:36:29 AM eth1 81306.00 40630.00 120521.44 2777.70 0.00 0.00 0.00 10:36:30 AM eth1 81286.00 40564.00 120480.20 2773.31 0.00 0.00 0.00 10:36:31 AM eth1 81256.00 40599.00 120438.81 2776.27 0.00 0.00 0.00 10:36:32 AM eth1 81287.00 40594.00 120480.69 2776.69 0.00 0.00 0.00 10:36:33 AM eth1 81279.00 40601.00 120478.53 2775.84 0.00 0.00 0.00 10:36:34 AM eth1 81277.00 40610.00 120476.94 2776.25 0.00 0.00 0.00 Average: eth1 81282.30 40606.70 120479.39 2776.54 0.00 0.00 0.20 lpx:~# echo 13000 >/sys/class/net/eth1/gro_flush_timeout lpx:~# sar -n DEV 1 10 \| grep eth1 10:36:43 AM eth1 81257.00 7747.00 120437.44 530.00 0.00 0.00 0.00 10:36:44 AM eth1 81278.00 7748.00 120480.00 529.85 0.00 0.00 0.00 10:36:45 AM eth1 81282.00 7752.00 120479.09 531.09 0.00 0.00 0.00 10:36:46 AM eth1 81282.00 7751.00 120478.80 530.90 0.00 0.00 0.00 10:36:47 AM eth1 81276.00 7745.00 120478.31 529.64 0.00 0.00 0.00 10:36:48 AM eth1 81278.00 7747.00 120478.50 529.81 0.00 0.00 0.00 10:36:49 AM eth1 81282.00 7749.00 120478.88 530.01 0.00 0.00 0.00 10:36:50 AM eth1 81284.00 7751.00 120481.52 530.20 0.00 0.00 0.00 10:36:51 AM eth1 81299.00 7769.00 120481.74 533.81 0.00 0.00 0.00 10:36:52 AM eth1 81281.00 7748.00 120478.62 529.96 0.00 0.00 0.00 Average: eth1 81279.90 7750.70 120475.29 530.53 0.00 0.00 0.00 Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:20:09 -05:00
David S. Miller	050f7dcd27	Merge branch 'dsa-next' Florian Fainelli says: ==================== net: dsa: code re-organization This pull request contains the first part of the patches required to implement the grand plan detailed here: http://www.spinics.net/lists/netdev/msg295942.html These are mostly code re-organization and function bodies re-arrangement to allow different callers of lower-level initialization functions for 'struct dsa_switch' and 'struct dsa_switch_tree' to be later introduced. There is no functional code change at this point. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:18:29 -05:00
Florian Fainelli	c86e59b9e6	net: dsa: extract dsa switch tree setup and removal Extract the core logic that setups a 'struct dsa_switch_tree' and removes it, update dsa_probe() and dsa_remove() to use the two helper functions. This will be useful to allow for other callers to setup this structure differently. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:18:20 -05:00
Florian Fainelli	5929903103	net: dsa: let switches specify their tagging protocol In order to support the new DSA device driver model, a dsa_switch should be able to advertise the type of tagging protocol supported by the underlying switch device. This also removes constraints on how tagging can be stacked to each other. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:18:20 -05:00
Florian Fainelli	df197195a5	net: dsa: split dsa_switch_setup into two functions Split the part of dsa_switch_setup() which is responsible for allocating and initializing a 'struct dsa_switch' and the part which is doing a given switch device setup and slave network device creation. This is a preliminary change to allow a separate caller of dsa_switch_setup_one() which may have externally initialized the dsa_switch structure, outside of dsa_switch_setup(). Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:18:20 -05:00
Florian Fainelli	b324c07ac4	net: dsa: allow deferred probing In preparation for allowing a different model to register DSA switches, update dsa_of_probe() and dsa_probe() to return -EPROBE_DEFER where appropriate. Failure to find a phandle or Device Tree property is still fatal, but looking up the internal device structure associated with a Device Tree node is something that might need to be delayed based on driver probe ordering. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:18:20 -05:00
Florian Fainelli	f1a26a062f	net: dsa: update dsa_of_{probe, remove} to use a device pointer In preparation for allowing a different mechanism to register DSA switch devices and driver, update dsa_of_probe and dsa_of_remove to take a struct device pointer since neither of these two functions uses the struct platform_device pointer. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-06 00:18:20 -05:00
Eric Dumazet	496127290f	inet_diag: remove duplicate code from inet_twsk_diag_dump() timewait sockets now share a common base with established sockets. inet_twsk_diag_dump() can use inet_diag_bc_sk() instead of duplicating code, granted that inet_diag_bc_sk() does proper userlocks initialization. twsk_build_assert() will catch any future changes that could break the assumptions. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-05 22:55:44 -05:00
Jeff Kirsher	e815665e1a	i40e: Fix mismatching type for ioremap_len As pointed out by Ben Hutchings, ioremap uses unsigned long as its parameter type, so we should be using that instead of u32 or int. Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-05 22:11:38 -05:00
Erik Hugne	d0f91938be	tipc: add ip/udp media type The ip/udp bearer can be configured in a point-to-point mode by specifying both local and remote ip/hostname, or it can be enabled in multicast mode, where links are established to all tipc nodes that have joined the same multicast group. The multicast IP address is generated based on the TIPC network ID, but can be overridden by using another multicast address as remote ip. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-05 22:08:42 -05:00
Erik Hugne	948fa2d115	tipc: increase size of tipc discovery messages The payload area following the TIPC discovery message header is an opaque area defined by the media. INT_H_SIZE was enough for Ethernet/IB/IPv4 but needs to be expanded to carry IPv6 addressing information. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-05 22:08:42 -05:00

1 2 3 4 5 ...

506449 Commits All Branches Search

506449 Commits

All Branches