OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
David S. Miller	fcd2b0da73	Merge branch 'rds-ha-failover-fixes' Sowmini Varadhan says: ==================== RDS: TCP: HA/Failover fixes This series contains a set of fixes for bugs exposed when we ran the following in a loop between a test machine pair: while (1); do # modprobe rds-tcp on test nodes # run rds-stress in bi-dir mode between test machine pair # modprobe -r rds-tcp on test nodes done rds-stress in bi-dir mode will cause both nodes to initiate RDS-TCP connections at almost the same instant, exposing the bugs fixed in this series. Without the fixes, rds-stress reports sporadic packet drops, and packets arriving out of sequence. After the fixes,we have been able to run the test overnight, without any issues. Each patch has a detailed description of the root-cause fixed by the patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-17 13:35:19 -05:00
Sowmini Varadhan	1a0e100fb2	RDS: TCP: Force every connection to be initiated by numerically smaller IP address When 2 RDS peers initiate an RDS-TCP connection simultaneously, there is a potential for "duelling syns" on either/both sides. See commit `241b271952` ("RDS-TCP: Reset tcp callbacks if re-using an outgoing socket in rds_tcp_accept_one()") for a description of this condition, and the arbitration logic which ensures that the numerically large IP address in the TCP connection is bound to the RDS_TCP_PORT ("canonical ordering"). The rds_connection should not be marked as RDS_CONN_UP until the arbitration logic has converged for the following reason. The sender may start transmitting RDS datagrams as soon as RDS_CONN_UP is set, and since the sender removes all datagrams from the rds_connection's cp_retrans queue based on TCP acks. If the TCP ack was sent from a tcp socket that got reset as part of duel aribitration (but before data was delivered to the receivers RDS socket layer), the sender may end up prematurely freeing the datagram, and the datagram is no longer reliably deliverable. This patch remedies that condition by making sure that, upon receipt of 3WH completion state change notification of TCP_ESTABLISHED in rds_tcp_state_change, we mark the rds_connection as RDS_CONN_UP if, and only if, the IP addresses and ports for the connection are canonically ordered. In all other cases, rds_tcp_state_change will force an rds_conn_path_drop(), and rds_queue_reconnect() on both peers will restart the connection to ensure canonical ordering. A side-effect of enforcing this condition in rds_tcp_state_change() is that rds_tcp_accept_one_path() can now be refactored for simplicity. It is also no longer possible to encounter an RDS_CONN_UP connection in the arbitration logic in rds_tcp_accept_one(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-17 13:35:18 -05:00
Sowmini Varadhan	905dd4184e	RDS: TCP: Track peer's connection generation number The RDS transport has to be able to distinguish between two types of failure events: (a) when the transport fails (e.g., TCP connection reset) but the RDS socket/connection layer on both sides stays the same (b) when the peer's RDS layer itself resets (e.g., due to module reload or machine reboot at the peer) In case (a) both sides must reconnect and continue the RDS messaging without any message loss or disruption to the message sequence numbers, and this is achieved by rds_send_path_reset(). In case (b) we should reset all rds_connection state to the new incarnation of the peer. Examples of state that needs to be reset are next expected rx sequence number from, or messages to be retransmitted to, the new incarnation of the peer. To achieve this, the RDS handshake probe added as part of commit `5916e2c155` ("RDS: TCP: Enable multipath RDS for TCP") is enhanced so that sender and receiver of the RDS ping-probe will add a generation number as part of the RDS_EXTHDR_GEN_NUM extension header. Each peer stores local and remote generation numbers as part of each rds_connection. Changes in generation number will be detected via incoming handshake probe ping request or response and will allow the receiver to reset rds_connection state. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-17 13:35:18 -05:00
Sowmini Varadhan	315ca6d98e	RDS: TCP: set RDS_FLAG_RETRANSMITTED in cp_retrans list As noted in rds_recv_incoming() sequence numbers on data packets can decreas for the failover case, and the Rx path is equipped to recover from this, if the RDS_FLAG_RETRANSMITTED is set on the rds header of an incoming message with a suspect sequence number. The RDS_FLAG_RETRANSMITTED is predicated on the RDS_FLAG_RETRANSMITTED flag in the rds_message, so make sure the flag is set on messages queued for retransmission. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-17 13:35:18 -05:00
LABBE Corentin	b3e5106962	net: stmmac: replace if (netif_msg_type) by their netif_xxx counterpart As sugested by Joe Perches, we could replace all if (netif_msg_type(priv)) dev_xxx(priv->devices, ...) by the simpler macro netif_xxx(priv, hw, priv->dev, ...) Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-17 13:30:30 -05:00
LABBE Corentin	de9a2165a5	net: stmmac: replace hardcoded function name by __func__ Some printing have the function name hardcoded. It is better to use __func__ instead. Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-17 13:30:30 -05:00
LABBE Corentin	38ddc59d65	net: stmmac: replace all pr_xxx by their netdev_xxx counterpart The stmmac driver use lots of pr_xxx functions to print information. This is bad since we cannot know which device logs the information. (moreover if two stmmac device are present) Furthermore, it seems that it assumes wrongly that all logs will always be subsequent by using a dev_xxx then some indented pr_xxx like this: kernel: sun7i-dwmac 1c50000.ethernet: no reset control found kernel: Ring mode enabled kernel: No HW DMA feature register supported kernel: Normal descriptors kernel: TX Checksum insertion supported So this patch replace all pr_xxx by their netdev_xxx counterpart. Excepts for some printing where netdev "cause" unpretty output like: sun7i-dwmac 1c50000.ethernet (unnamed net_device) (uninitialized): no reset control found In those case, I keep dev_xxx. In the same time I remove some "stmmac:" print since this will be a duplicate with that dev_xxx displays. Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-17 13:30:30 -05:00
Eric Dumazet	29c58472ec	net_sched: sch_fq: use hash_ptr() When I wrote sch_fq.c, hash_ptr() on 64bit arches was awful, and I chose hash_32(). Linus Torvalds and George Spelvin fixed this issue, so we can use hash_ptr() to get more entropy on 64bit arches with Terabytes of memory, and avoid the cast games. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-17 13:28:56 -05:00
Eric Dumazet	d30d9ccbfa	net/mlx5e: remove napi_hash_del() calls Calling napi_hash_del() after netif_napi_del() is pointless. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-17 12:06:20 -05:00
Eric Dumazet	bb07fafaec	net/mlx4_en: remove napi_hash_del() call There is no need calling napi_hash_del()+synchronize_rcu() before calling netif_napi_del() netif_napi_del() does this already. Using napi_hash_del() in a driver is useful only when dealing with a batch of NAPI structures, so that a single synchronize_rcu() can be used. mlx4_en_deactivate_cq() is deactivating a single NAPI. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-17 11:17:05 -05:00
David S. Miller	6a02f5eb6a	Merge branch 'mlxsw-i2c' Jiri Pirko says: ==================== mlxsw: Introduce support for I2C bus Vadim says: This patchset adds I2C access support for SwitchX, SwitchX2, SwitchIB, SwitchIB2 and Spectrum silicones. It contains: - Small changes in mlxsw core code, needed for I2C bus support; - I2C driver, which obtains I2C input/output mailboxes setting and provides command interface implementation. - Minimal driver, which works on top of I2C driver and allows running of mlxsw command interface over I2C bus; Use case: On system, which does not have PCI to ASIC (BMC), hwmon functionality (sensors, pwm, tacho) will be available through I2C. Usage (manual probing): echo mlxsw_minimal 0x48 > /sys/bus/i2c/devices/i2c-2/new_device Sysfs interface: /sys/bus/i2c/devices/2-0048/hwmon/hwmon5/pwm1 /sys/bus/i2c/devices/2-0048/hwmon/hwmon5/temp1_input ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 23:29:05 -05:00
Vadim Pasternak	d556e92916	mlxsw: minimal: Add I2C support for Mellanox ASICs Add I2C access support for Mellanox ASICs: - Virtual Protocol Interconnect switches SwitchX, SwitchX2, providing InfiniBand, Ethernet and Fibre Channel connectivity; - Infiniband switches SwitchIB, SwitchIB2: - Ethernet switch Spectrum. Example of probing activation: echo mlxsw_minimal 0x48 > /sys/bus/i2c/devices/i2c-2/new_device Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 23:29:04 -05:00
Vadim Pasternak	4ebd00bc5f	mlxsw: Invoke driver's init/fini methods only if defined We are going to add a minimal driver on top of the mlxsw core infrastructure, which will be mainly used for hardware monitoring in Baseboard management controller (BMC) installations. Unlike the switch drivers (e.g., spectrum, switchx2), this driver does not initialize the ASIC and therefore doesn't need to implement the init() and fini() methods in its 'mlxsw_driver' struct. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 23:29:04 -05:00
Vadim Pasternak	6882b0aee1	mlxsw: Introduce support for I2C bus Add I2C bus implementation for Mellanox Technologies Switch ASICs. This includes command interface implementation using input / out mailboxes, whose location is retrieved from the firmware during probe time. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 23:29:04 -05:00
Vadim Pasternak	c711e27a18	mlxsw: Add bus capability flag The mlxsw core infrastructure currently assumes that communication with the ASIC is always possible using Ethernet management datagrams (EMADs), but this is only possible when the PCI bus is used. The bus capability flag is added to indicate EMAD support and make core initialize EMAD communication only when it's set. Otherwise, register access is done using command interface. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 23:29:04 -05:00
Julia Lawall	d01eb808c7	net: netcp: replace IS_ERR_OR_NULL by IS_ERR knav_queue_open always returns an ERR_PTR value, never NULL. This can be confirmed by unfolding the function calls and conforms to the function's documentation. Thus, replace IS_ERR_OR_NULL by IS_ERR in error checks. The change is made using the following semantic patch: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x; statement S; @@ x = knav_queue_open(...); if ( - IS_ERR_OR_NULL + IS_ERR (x)) S // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 23:26:36 -05:00
Xin Long	7fda702f93	sctp: use new rhlist interface on sctp transport rhashtable Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key to hash a node to one chain. If in one host thousands of assocs connect to one server with the same lport and different laddrs (although it's not a normal case), all the transports would be hashed into the same chain. It may cause to keep returning -EBUSY when inserting a new node, as the chain is too long and sctp inserts a transport node in a loop, which could even lead to system hangs there. The new rhlist interface works for this case that there are many nodes with the same key in one chain. It puts them into a list then makes this list be as a node of the chain. This patch is to replace rhashtable_ interface with rhltable_ interface. Since a chain would not be too long and it would not return -EBUSY with this fix when inserting a node, the reinsert loop is also removed here. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 23:22:17 -05:00
David S. Miller	00a636e776	Merge branch 'bnxt_en-next' Michael Chan says: ==================== bnxt_en: Updates. New firmware spec. update, autoneg update, and UDP RSS support. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 23:11:07 -05:00
Michael Chan	a011952a1a	bnxt_en: Add ethtool -n\|-N rx-flow-hash support. To display and modify the RSS hash. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 23:11:07 -05:00
Michael Chan	87da7f796d	bnxt_en: Add UDP RSS support for 57X1X chips. The newer chips have proper support for 4-tuple UDP RSS. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 23:11:07 -05:00
Michael Chan	286ef9d64e	bnxt_en: Enhance autoneg support. On some dual port NICs, the speed setting on one port can affect the available speed on the other port. Add logic to detect these changes and adjust the advertised speed settings when necessary. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 23:11:07 -05:00
Michael Chan	16d663a69f	bnxt_en: Update firmware interface spec to 1.5.4. Use the new FORCE_LINK_DWN bit to shutdown link during close. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 23:11:07 -05:00
Eric Dumazet	89c4b442b7	netpoll: more efficient locking Callers of netpoll_poll_lock() own NAPI_STATE_SCHED Callers of netpoll_poll_unlock() have BH blocked between the NAPI_STATE_SCHED being cleared and poll_lock is released. We can avoid the spinlock which has no contention, and use cmpxchg() on poll_owner which we need to set anyway. This removes a possible lockdep violation after the cited commit, since sk_busy_loop() re-enables BH before calling busy_poll_stop() Fixes: `217f697436` ("net: busy-poll: allow preemption in sk_busy_loop()") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 18:32:02 -05:00
Rafal Ozieblo	1629dd4f76	cadence: Add LSO support. New Cadence GEM hardware support Large Segment Offload (LSO): TCP segmentation offload (TSO) as well as UDP fragmentation offload (UFO). Support for those features was added to the driver. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 17:54:55 -05:00
Arnd Bergmann	08348995c4	netronome: don't access real_num_rx_queues directly The netdev->real_num_rx_queues setting is only available if CONFIG_SYSFS is enabled, so we now get a build failure when that is turned off: netronome/nfp/nfp_net_common.c: In function 'nfp_net_ring_swap_enable': netronome/nfp/nfp_net_common.c:2489:18: error: 'struct net_device' has no member named 'real_num_rx_queues'; did you mean 'real_num_tx_queues'? As far as I can tell, the check here is only used as an optimization that we can skip in order to fix the compilation. If sysfs is disabled, the following netif_set_real_num_rx_queues() has no effect. Fixes: `164d1e9e5d` ("nfp: add support for ethtool .set_channels") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 17:05:48 -05:00
Eric Dumazet	973334a1d3	sfc: remove napi_hash_del() call Calling napi_hash_del() after netif_napi_del() is pointless. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Edward Cree <ecree@solarflare.com> Cc: Bert Kenward <bkenward@solarflare.com> Acked-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 17:04:49 -05:00
David Lebrun	a23a8f5bc1	lwtunnel: subtract tunnel headroom from mtu on output redirect This patch changes the lwtunnel_headroom() function which is called in ipv4_mtu() and ip6_mtu(), to also return the correct headroom value when the lwtunnel state is OUTPUT_REDIRECT. This patch enables e.g. SR-IPv6 encapsulations to work without manually setting the route mtu. Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 17:01:15 -05:00
Ido Schimmel	d331d30360	mlxsw: spectrum_router: Adjust placement of FIB abort warning The recent merge commit `bb598c1b8c` ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net") would cause the FIB abort warning to fire whenever we flush the FIB tables - either during module removal or actual abort. Move it back to its rightful location in the FIB abort function. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 15:18:01 -05:00
Andrew Lunn	0b6e3d0322	net: dsa: mv88e6xxx: Respect SPEED_UNFORCED, don't set force bit The SPEED_UNFORCED indicates the MAC & PHY should perform auto-negotiation to determine a speed which works. If this is called for, don't set the force bit. If it is set, the MAC actually does 10Gbps, why the internal PHYs don't support. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 15:12:51 -05:00
David S. Miller	a11485b737	Merge branch 'amd-xgbe-next' Tom Lendacky says: ==================== amd-xgbe: AMD XGBE driver updates 2016-11-15 This patch series addresses some minor issues found in the recently accepted patch series for the AMD XGBE driver. The following fixes are included in this driver update series: - Fix a possibly uninitialized variable in the debugfs support - Fix the GPIO pin number constraint check This patch series is based on net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 13:57:45 -05:00
Lendacky, Thomas	1c1f619e45	amd-xgbe: Fix maximum GPIO value check The GPIO support in the hardware allows for up to 16 GPIO pins, enumerated from 0 to 15. The driver uses the wrong value (16) to validate the GPIO pin range in the routines to set and clear the GPIO output pins. Update the code to use the correct value (15). Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 13:57:44 -05:00
Lendacky, Thomas	5e4cbaa7fb	amd-xgbe: Fix possible uninitialized variable The debugfs support in the driver uses a common routine to write the debugfs values. In this routine, if the input file position is non-zero then the write routine will not return an error and an output parameter will not have been set. Because an error isn't returned an uninitialized value will be written into a register. Fix the common write routine to return an error if the input file position is non-zero, which will propagate back to the caller. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 13:57:44 -05:00
David S. Miller	54dbf3a577	Merge branch 'nway-reset' Florian Fainelli says: ==================== net: Implenent ethtool::nway_reset for a few drivers This patch series depends on "net: phy: Centralize auto-negotation restart" since it provides phy_ethtool_nway_reset as a helper function. The drivers here already support PHYLIB, so there really is no reason why restarting auto-negotiation would not be possible with these. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 13:44:02 -05:00
Florian Fainelli	13f0ac4109	net: ethernet: marvell: pxa168_eth: Implement ethtool::nway_reset Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are already using dev->phydev all over the place so this comes for free. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 13:44:01 -05:00
Florian Fainelli	00606c4986	net: ethernet: mvpp2: Implement ethtool::nway_reset Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are already using dev->phydev all over the place so this comes for free. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 13:44:01 -05:00
Florian Fainelli	5489ee8976	net: ethernet: mvneta: Implement ethtool::nway_reset Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are already using dev->phydev all over the place so this comes for free. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 13:44:01 -05:00
Florian Fainelli	3d3ba5685e	net: ethoc: Implement ethtool::nway_reset Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are already using dev->phydev all over the place so this comes for free. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 13:44:00 -05:00
Florian Fainelli	cab247e9df	net: stmmac: Implement ethtool::nway_reset Utilize the generic phy_ethtool_nway_reset() helper function to implement an autonegotiation restart. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 13:44:00 -05:00
David S. Miller	fc3f9146ce	Merge branch 'busypoll-preemption-and-other-optimizations' Eric Dumazet says: ==================== net: busy-poll: allow preemption and other optimizations It is time to have preemption points in sk_busy_loop() and improve its scalability. Also napi_complete() and friends can tell drivers when it is safe to not re-enable device interrupts, saving some overhead under high busy polling. mlx4 and bnx2x are changed accordingly, to show how this busy polling status can be exploited by drivers. Next steps will implement Zach Brown suggestion, where NAPI polling would be enabled all the time for some chosen queues. This is needed for efficient epoll() support anyway. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 13:40:59 -05:00
Eric Dumazet	80f1c21c53	bnx2x: switch to napi_complete_done() Switch from napi_complete() to napi_complete_done() for better GRO support (gro_flush_timeout) and core NAPI features. Do not rearm interrupts if we are busy polling, to reduce bus and interrupts overhead. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Adam Belay <abelay@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Yuval Mintz <Yuval.Mintz@cavium.com> Cc: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 13:40:58 -05:00
Eric Dumazet	2e71328375	net/mlx4_en: use napi_complete_done() return value Do not rearm interrupts if we are busy polling. mlx4 uses separate CQ for TX and RX, so number of TX interrupts does not change, unfortunately. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Adam Belay <abelay@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Yuval Mintz <Yuval.Mintz@cavium.com> Cc: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 13:40:58 -05:00
Eric Dumazet	364b605573	net: busy-poll: return busypolling status to drivers NAPI drivers use napi_complete_done() or napi_complete() when they drained RX ring and right before re-enabling device interrupts. In busy polling, we can avoid interrupts being delivered since we are polling RX ring in a controlled loop. Drivers can chose to use napi_complete_done() return value to reduce interrupts overhead while busy polling is active. This is optional, legacy drivers should work fine even if not updated. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Adam Belay <abelay@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Yuval Mintz <Yuval.Mintz@cavium.com> Cc: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 13:40:58 -05:00
Eric Dumazet	21cb84c48c	net: busy-poll: remove need_resched() from sk_can_busy_loop() Now sk_busy_loop() can schedule by itself, we can remove need_resched() check from sk_can_busy_loop() Also add a const to its struct sock parameter. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Adam Belay <abelay@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Yuval Mintz <Yuval.Mintz@cavium.com> Cc: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 13:40:58 -05:00
Eric Dumazet	217f697436	net: busy-poll: allow preemption in sk_busy_loop() After commit `4cd13c21b2` ("softirq: Let ksoftirqd do its job"), sk_busy_loop() needs a bit of care : softirqs might be delayed since we do not allow preemption yet. This patch adds preemptiom points in sk_busy_loop(), and makes sure no unnecessary cache line dirtying or atomic operations are done while looping. A new flag is added into napi->state : NAPI_STATE_IN_BUSY_POLL This prevents napi_complete_done() from clearing NAPIF_STATE_SCHED, so that sk_busy_loop() does not have to grab it again. Similarly, netpoll_poll_lock() is done one time. This gives about 10 to 20 % improvement in various busy polling tests, especially when many threads are busy polling in configurations with large number of NIC queues. This should allow experimenting with bigger delays without hurting overall latencies. Tested: On a 40Gb mlx4 NIC, 32 RX/TX queues. echo 70 >/proc/sys/net/core/busy_read for i in `seq 1 40`; do echo -n $i: ; ./super_netperf $i -H lpaa24 -t UDP_RR -- -N -n; done Before: After: 1: 90072 92819 2: 157289 184007 3: 235772 213504 4: 344074 357513 5: 394755 458267 6: 461151 487819 7: 549116 625963 8: 544423 716219 9: 720460 738446 10: 794686 837612 11: 915998 923960 12: 937507 925107 13: 1019677 971506 14: 1046831 1113650 15: 1114154 1148902 16: 1105221 1179263 17: 1266552 `1299585` 18: 1258454 1383817 19: 1341453 1312194 20: 1363557 1488487 21: 1387979 1501004 22: 1417552 1601683 23: 1550049 `1642002` 24: 1568876 1601915 25: 1560239 1683607 26: 1640207 1745211 27: 1706540 1723574 28: 1638518 1722036 29: 1734309 1757447 30: 1782007 1855436 31: 1724806 1888539 32: 1717716 1944297 33: 1778716 1869118 34: 1805738 1983466 35: 1815694 2020758 36: 1893059 2035632 37: 1843406 2034653 38: 1888830 2086580 39: 1972827 2143567 40: 1877729 2181851 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Adam Belay <abelay@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Yuval Mintz <Yuval.Mintz@cavium.com> Cc: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 13:40:57 -05:00
Martin KaFai Lau	2874aa2e46	bpf: Fix compilation warning in __bpf_lru_list_rotate_inactive gcc-6.2.1 gives the following warning: kernel/bpf/bpf_lru_list.c: In function ‘__bpf_lru_list_rotate_inactive.isra.3’: kernel/bpf/bpf_lru_list.c:201:28: warning: ‘next’ may be used uninitialized in this function [-Wmaybe-uninitialized] The "next" is currently initialized in the while() loop which must have >=1 iterations. This patch initializes next to get rid of the compiler warning. Fixes: `3a08c2fd76` ("bpf: LRU List") Reported-by: David Miller <davem@davemloft.net> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 11:30:56 -05:00
David Lebrun	46738b1317	ipv6: sr: add option to control lwtunnel support This patch adds a new option CONFIG_IPV6_SEG6_LWTUNNEL to enable/disable support of encapsulation with the lightweight tunnels. When this option is enabled, CONFIG_LWTUNNEL is automatically selected. Fix commit `6c8702c60b` ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Without a proper option to control lwtunnel support for SR-IPv6, if CONFIG_LWTUNNEL=n then the IPv6 initialization fails as a consequence of seg6_iptunnel_init() failure with EOPNOTSUPP: NET: Registered protocol family 10 IPv6: Attempt to unregister permanent protocol 6 IPv6: Attempt to unregister permanent protocol 136 IPv6: Attempt to unregister permanent protocol 17 NET: Unregistered protocol family 10 Tested (compiling, booting, and loading ipv6 module when relevant) with possible combinations of CONFIG_IPV6={y,m,n}, CONFIG_IPV6_SEG6_LWTUNNEL={y,n} and CONFIG_LWTUNNEL={y,n}. Reported-by: Lorenzo Colitti <lorenzo@google.com> Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-16 11:29:46 -05:00
David S. Miller	21b23daeba	Merge branch 'alx-multiqueue-support' Tobias Regnery says: ==================== alx: add multi queue support This patchset lays the groundwork for multi queue support in the alx driver and enables multi queue support for the tx path by default. The hardware supports up to 4 tx queues. Benefits are better utilization of multi core cpus and the usage of the msi-x support by default which splits the handling of rx / tx and misc other interrupts. The rx path is a little bit harder because apparently (based on the limited information from the downstream driver) the hardware supports up to 8 rss queues but only has one hardware descriptor ring on the rx side. So the rx path will be part of another patchset. Tested on my AR8161 ethernet adapter with different tests: - there are no regressions observed during my daily usage - iperf tcp and udp tests shows no performance regressions - netperf TCP_RR and UDP_RR shows a slight performance increase of about 1-2% with this patchset applied This work is based on the downstream driver at github.com/qca/alx Changes in V2: - drop unneeded casts in alx_alloc_rx_ring (Patch 1) - add additional information about testing and benefit to the changelog ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-15 22:46:31 -05:00
Tobias Regnery	d768319cd4	alx: enable multiple tx queues Enable multiple tx queues by default based on the number of online cpus. The hardware supports up to four tx queues. Based on the downstream driver at github.com/qca/alx Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-15 22:46:30 -05:00
Tobias Regnery	f58e0f7747	alx: enable msi-x interrupts by default Remove the module parameter to enable msi-x support and enable msi-x interrupts unconditionally by default. This is a preparatory step to enable multi queue support by default, because this is only working with msi-x interrupts. Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-15 22:46:30 -05:00
Tobias Regnery	2e06826bc6	alx: prepare tx path for multi queue support This patch prepares the tx path to send data on multiple tx queues. It introduces per queue register adresses and uses them in the alx_tx_queue structs. There are new helper functions for the queue mapping in the tx path. Based on the downstream driver at github.com/qca/alx Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-15 22:46:30 -05:00

1 2 3 4 5 ...

635607 Commits All Branches Search

635607 Commits

All Branches