OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Tonghao Zhang	dc151282bb	net: vhost: factor out busy polling logic to vhost_net_busy_poll() Factor out generic busy polling logic and will be used for in tx path in the next patch. And with the patch, qemu can set differently the busyloop_timeout for rx queue. To avoid duplicate codes, introduce the helper functions: * sock_has_rx_data(changed from sk_has_rx_data) * vhost_net_busy_poll_try_queue Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 20:25:55 -07:00
Tonghao Zhang	a6a67a2f34	net: vhost: replace magic number of lock annotation Use the VHOST_NET_VQ_XXX as a subclass for mutex_lock_nested. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 20:25:55 -07:00
Tonghao Zhang	78139c94dc	net: vhost: lock the vqs one by one This patch changes the way that lock all vqs at the same, to lock them one by one. It will be used for next patch to avoid the deadlock. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 20:25:54 -07:00
Yafang Shao	af4325ecc2	tcp: expose sk_state in tcp_retransmit_skb tracepoint After sk_state exposed, we can get in which state this retransmission occurs. That could give us more detail for dignostic. For example, if this retransmission occurs in SYN_SENT state, it may also indicates that the syn packet may be dropped on the remote peer due to syn backlog queue full and then we could check the remote peer. BTW,SYNACK retransmission is traced in tcp_retransmit_synack tracepoint. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 20:07:19 -07:00
YueHaibing	0a71515665	net: faraday: fix return type of ndo_start_xmit function The method ndo_start_xmit() is defined as returning an 'netdev_tx_t', which is a typedef for an enum type, so make sure the implementation in this driver has returns 'netdev_tx_t' value, and change the function return type to netdev_tx_t. Found by coccinelle. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 10:18:08 -07:00
YueHaibing	6323d57f33	net: smsc: fix return type of ndo_start_xmit function The method ndo_start_xmit() is defined as returning an 'netdev_tx_t', which is a typedef for an enum type, so make sure the implementation in this driver has returns 'netdev_tx_t' value, and change the function return type to netdev_tx_t. Found by coccinelle. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 10:15:17 -07:00
zhong jiang	880e1b2111	net: liquidio: list usage cleanup Trival cleanup, list_move_tail will implement the same function that list_del() + list_add_tail() will do. hence just replace them. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 10:12:10 -07:00
zhong jiang	631e871edc	net: qed: list usage cleanup Trival cleanup, list_move_tail will implement the same function that list_del() + list_add_tail() will do. hence just replace them. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 10:11:36 -07:00
David S. Miller	30b0594a3e	Merge branch 'net-bridge-convert-bool-options-to-bits' Nikolay Aleksandrov says: ==================== net: bridge: convert bool options to bits A lot of boolean bridge options have been added around the net_bridge structure resulting in holes and more importantly different cache lines that need to be fetched in the fast path. This set moves all of those to bits in a bitfield which resides in a hot cache line thus reducing the size of net_bridge, the number of holes and the number of cache lines needed for the fast path. The set is also sent in preparation for new boolean options to avoid spreading them in the structure and making new holes. One nice side-effect is that we avoid potential race conditions by using the bitops since some of the options were bits being directly set in parallel risking hard to debug issues (has_ipv6_addr). Before: size: 1184, holes: 8, sum holes: 30 After: size: 1160, holes: 3, sum holes: 7 Patch 01 is a trivial style fix Patch 02 adds the new options bitfield and converts the vlan boolean options to bits Patches 03-08 convert the rest of the boolean options to bits Patch 09 re-arranges a few fields in net_bridge to further reduce size v2: patch 09: remove the comment about offload_fwd_mark in net_bridge and leave it where it is now, thanks to Ido for spotting it ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 10:04:23 -07:00
Nikolay Aleksandrov	35750b0bca	net: bridge: pack net_bridge better Further reduce the size of net_bridge with 8 bytes and reduce the number of holes in it: Before: holes: 5, sum holes: 15 After: holes: 3, sum holes: 7 Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 10:04:23 -07:00
Nikolay Aleksandrov	3341d91702	net: bridge: convert mtu_set_by_user to a bit Convert the last remaining bool option to a bit thus reducing the overall net_bridge size further by 8 bytes. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 10:04:23 -07:00
Nikolay Aleksandrov	c69c2cd444	net: bridge: convert neigh_suppress_enabled option to a bit Convert the neigh_suppress_enabled option to a bit. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 10:04:23 -07:00
Nikolay Aleksandrov	675779adbf	net: bridge: convert mcast options to bits This patch converts the rest of the mcast options to bits. It also packs the mcast options a little better by moving multicast_mld_version to an existing hole, reducing the net_bridge size by 8 bytes. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 10:04:23 -07:00
Nikolay Aleksandrov	13cefad2f2	net: bridge: convert and rename mcast disabled Convert mcast disabled to an option bit and while doing so convert the logic to check if multicast is enabled instead. That is make the logic follow the option value - if it's set then mcast is enabled and vice versa. This avoids a few confusing places where we inverted the value that's being set to follow the mcast_disabled logic. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 10:04:23 -07:00
Nikolay Aleksandrov	be3664a038	net: bridge: convert group_addr_set option to a bit Convert group_addr_set internal bridge opt to a bit. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 10:04:23 -07:00
Nikolay Aleksandrov	8df3510f28	net: bridge: convert nf call options to bits No functional change, convert of nf_call_[ip\|ip6\|arp]tables to bits. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 10:04:23 -07:00
Nikolay Aleksandrov	ae75767ec2	net: bridge: add bitfield for options and convert vlan opts Bridge options have usually been added as separate fields all over the net_bridge struct taking up space and ending up in different cache lines. Let's move them to a single bitfield to save up space and speedup lookups. This patch adds a simple API for option modifying and retrieving using bitops and converts the first user of the API - the bridge vlan options (vlan_enabled and vlan_stats_enabled). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 10:04:22 -07:00
Nikolay Aleksandrov	1c1cb6d032	net: bridge: make struct opening bracket consistent Currently we have a mix of opening brackets on new lines and on the same line, let's move them all on the same line. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 10:04:22 -07:00
David S. Miller	37ac5db6e6	Merge branch 's390-net-next' Julian Wiedmann says: ==================== s390/net: updates 2018-09-26 please apply one more series of cleanups and small improvements for qeth to net-next. Note that one patch needs to touch both af_iucv and qeth, in order to untangle their receive paths. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 09:56:08 -07:00
Julian Wiedmann	91cc98f51e	s390/qeth: remove duplicated carrier state tracking The netdevice is always available, apply any carrier state changes to it without caching them. On a STARTLAN event (ie. carrier-up), defer updating the state to qeth_core_hardsetup_card() in the subsequent recovery action. Also remove the carrier-state checks from the xmit routines. Stopping transmission on carrier-down is the responsibility of upper-level code (eg see dev_direct_xmit()). Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 09:56:08 -07:00
Julian Wiedmann	d782d80f36	s390/qeth: clean up drop conditions for received cmds If qeth_check_ipa_data() consumed an event, there's no point in processing it further. So drop it early, and make the surrounding code a tiny bit more readable. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 09:56:07 -07:00
Julian Wiedmann	d19b93f40e	s390/qeth: re-indent qeth_check_ipa_data() Pull one level of checking up into qeth_send_control_data_cb(), and clean up an else-after-return. No functional change. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 09:56:07 -07:00
Julian Wiedmann	68bba11643	s390/qeth: consume local address events We have no code that is waiting for these events, so just drop them when they arrive. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 09:56:07 -07:00
Julian Wiedmann	6585ac4e5d	s390/qeth: remove various redundant code 1. tracing iob->rc makes no sense when it hasn't been modified by the callback, 2. the qeth_dbf_list is declared with LIST_HEAD, which also initializes the list, 3. the ccwgroup core only calls the thaw/restore callbacks if the gdev is online, so we don't have to check for it again. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 09:56:07 -07:00
Julian Wiedmann	8d908eb045	s390/qeth: remove CARD_FROM_CDEV helper The cdev-to-card translation walks through two layers of drvdata, with no locking or refcounting (where eg. the ccwgroup core only accesses a cdev's drvdata while holding the ccwlock). This might be safe for now, but any careless usage of the helper has the potential for subtle races and use-after-free's. Luckily there's only one occurrence where we _really_ need it (in qeth_irq()), for any other user we can just pass through an appropriate card pointer. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 09:56:07 -07:00
Julian Wiedmann	8f6637b878	s390/qeth: pass card pointer in iob callback This allows us to remove the CARD_FROM_CDEV calls in the iob callbacks. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 09:56:07 -07:00
Julian Wiedmann	6a3123d076	s390/qeth: re-use qeth_notify_skbs() When not using the CQ, this allows us avoid the second skb queue walk in qeth_release_skbs(). Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 09:56:07 -07:00
Julian Wiedmann	5a5312bdba	s390/qeth: remove additional skb refcount This was presumably left over from back when qeth recursed into dev_queue_xmit(). Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 09:56:07 -07:00
Julian Wiedmann	dc149e3764	s390/qeth: replace open-coded skb_queue_walk() To match the use of __skb_queue_purge(), also make the skb's enqueue in qeth_fill_buffer() lockless. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 09:56:07 -07:00
Julian Wiedmann	cd11d11286	net/af_iucv: locate IUCV header via skb_network_header() This patch attempts to untangle the TX and RX code in qeth from af_iucv's respective HiperTransport path: On the TX side, pointing skb_network_header() at the IUCV header means that qeth_l3_fill_af_iucv_hdr() no longer needs a magical offset to access the header. On the RX side, qeth pulls the (fake) L2 header off the skb like any normal ethernet driver would. This makes working with the IUCV header in af_iucv easier, since we no longer have to assume a fixed skb layout. While at it, replace the open-coded length checks in af_iucv's RX path with pskb_may_pull(). Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 09:56:07 -07:00
Julian Wiedmann	a2eb0ad50c	s390/qeth: on gdev release, reset drvdata qeth_core_probe_device() sets the gdev's drvdata, but doesn't reset it on a subsequent error. Move the (re-)setting around a bit, so that it happens symmetrically on allocating/freeing the qeth_card struct. This is no actual problem, as the ccwgroup core will discard the gdev on a probe error. But from qeth's perspective the gdev is an external resource, so it's best to manage it cleanly. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 09:56:07 -07:00
Julian Wiedmann	c1a935f6ec	s390/qeth: fix discipline unload after setup error Device initialization code usually first loads a subdriver (via qeth_core_load_discipline()), and then runs its setup() callback. If this fails, it rolls back the load via qeth_core_free_discipline(). qeth_core_free_discipline() expects the options.layer attribute to be initialized, but on error in setup() that's currently not the case. Resulting in misbalanced symbol_put() calls. Fix this by setting options.layer when loading the subdriver. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 09:56:07 -07:00
Julian Wiedmann	a70fee3b0f	s390/qeth: use DEFINE_MUTEX for qeth_mod_mutex Consolidate declaration and initialization of a static variable. While at it reduce its scope in qeth_core_load_discipline(), and simplify the return logic accordingly. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 09:56:07 -07:00
Julian Wiedmann	4fda335476	s390/qeth: convert layer attribute to enum While the raw values are fixed due to their use in a sysfs attribute, we can still use the proper QETH_DISCIPLINE_* enum within the driver. Also move the initialization into qeth_set_initial_options(), along with all other user-configurable fields. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 09:56:07 -07:00
David S. Miller	4b1bd69769	net: phy: marvell: Fix build. Local variable 'autoneg' doesn't even exist: drivers/net/phy/marvell.c: In function 'm88e1121_config_aneg': drivers/net/phy/marvell.c:468:25: error: 'autoneg' undeclared (first use in this function); did you mean 'put_net'? if (phydev->autoneg != autoneg \|\| changed) { ^~~~~~~ Fixes: `d6ab933647` ("net: phy: marvell: Avoid unnecessary soft reset") Reported-by:Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-25 22:41:31 -07:00
Roopa Prabhu	7aca011f88	bridge: br_arp_nd_proxy: set icmp6_router if neigh has NTF_ROUTER Fixes: `ed842faeb2` ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports") Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-25 20:33:21 -07:00
David S. Miller	105bc1306e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2018-09-25 The following pull-request contains BPF updates for your net-next tree. The main changes are: 1) Allow for RX stack hardening by implementing the kernel's flow dissector in BPF. Idea was originally presented at netconf 2017 [0]. Quote from merge commit: [...] Because of the rigorous checks of the BPF verifier, this provides significant security guarantees. In particular, the BPF flow dissector cannot get inside of an infinite loop, as with CVE-2013-4348, because BPF programs are guaranteed to terminate. It cannot read outside of packet bounds, because all memory accesses are checked. Also, with BPF the administrator can decide which protocols to support, reducing potential attack surface. Rarely encountered protocols can be excluded from dissection and the program can be updated without kernel recompile or reboot if a bug is discovered. [...] Also, a sample flow dissector has been implemented in BPF as part of this work, from Petar and Willem. [0] http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf 2) Add support for bpftool to list currently active attachment points of BPF networking programs providing a quick overview similar to bpftool's perf subcommand, from Yonghong. 3) Fix a verifier pruning instability bug where a union member from the register state was not cleared properly leading to branches not being pruned despite them being valid candidates, from Alexei. 4) Various smaller fast-path optimizations in XDP's map redirect code, from Jesper. 5) Enable to recognize BPF_MAP_TYPE_REUSEPORT_SOCKARRAY maps in bpftool, from Roman. 6) Remove a duplicate check in libbpf that probes for function storage, from Taeung. 7) Fix an issue in test_progs by avoid checking for errno since on success its value should not be checked, from Mauricio. 8) Fix unused variable warning in bpf_getsockopt() helper when CONFIG_INET is not configured, from Anders. 9) Fix a compilation failure in the BPF sample code's use of bpf_flow_keys, from Prashant. 10) Minor cleanups in BPF code, from Yue and Zhong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-25 20:29:38 -07:00
Hauke Mehrtens	3475372ff6	net: dsa: lantiq_gswip: Depend on HAS_IOMEM The driver uses devm_ioremap_resource() which is only available when CONFIG_HAS_IOMEM is set, make the driver depend on this config option. User mode Linux does not have CONFIG_HAS_IOMEM set and the driver was failing on this architecture. Fixes: `14fceff477` ("net: dsa: Add Lantiq / Intel DSA driver for vrx200") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-25 20:27:43 -07:00
David S. Miller	921f432cea	Merge branch 'net-phy-Eliminate-unnecessary-soft' Florian Fainelli says: ==================== net: phy: Eliminate unnecessary soft This patch series eliminates unnecessary software resets of the PHY. This should hopefully not break anybody's hardware; but I would appreciate testing to make sure this is is the case. Sorry for this long email list, I wanted to make sure I reached out to all people who made changes to the Marvell PHY driver. Thank you! Changes since RFT: - added Tested-by tags from Wang, Dongsheng, Andrew, Chris and Clemens ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-25 20:26:45 -07:00
Florian Fainelli	d6ab933647	net: phy: marvell: Avoid unnecessary soft reset The BMCR.RESET bit on the Marvell PHYs has a special meaning in that it commits the register writes into the HW for it to latch and be configured appropriately. Doing software resets causes link drops, and this is unnecessary disruption if nothing changed. Determine from marvell_set_polarity()'s return code whether the register value was changed and if it was, propagate that to the logic that hits the software reset bit. This avoids doing unnecessary soft reset if the PHY is configured in the same state it was previously, this also eliminates the need for a m88e1111_config_aneg() function since it now is the same as marvell_config_aneg(). Tested-by: Wang, Dongsheng <dongsheng.wang@hxt-semitech.com> Tested-by: Chris Healy <cphealy@gmail.com> Tested-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Clemens Gruber <clemens.gruber@pqgruber.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-25 20:26:45 -07:00
Florian Fainelli	6e2d85ec05	net: phy: Stop with excessive soft reset While consolidating the PHY reset in phy_init_hw() an unconditionaly BMCR soft-reset I became quite trigger happy with those. This was later on deactivated for the Generic PHY driver on the premise that a prior software entity (e.g: bootloader) might have applied workarounds in commit `0878fff1f4` ("net: phy: Do not perform software reset for Generic PHY"). Since we have a hook to wire-up a soft_reset callback, just use that and get rid of the call to genphy_soft_reset() entirely. This speeds up initialization and link establishment for most PHYs out there that do not require a reset. Fixes: `87aa9f9c61` ("net: phy: consolidate PHY reset in phy_init_hw()") Tested-by: Wang, Dongsheng <dongsheng.wang@hxt-semitech.com> Tested-by: Chris Healy <cphealy@gmail.com> Tested-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Clemens Gruber <clemens.gruber@pqgruber.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-25 20:26:45 -07:00
David S. Miller	71f9b61c5b	Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2018-09-25 This series contains updates to i40e and xsk. Mariusz fixes an issue where the VF link state was not being updated properly when the PF is down or up. Also cleaned up the promiscuous configuration during a VF reset. Patryk simplifies the code a bit to use the variables for PF and HW that are declared, rather than using the VSI pointers. Cleaned up the message length parameter to several virtchnl functions, since it was not being used (or needed). Harshitha fixes two potential race conditions when trying to change VF settings by creating a helper function to validate that the VF is enabled and that the VSI is set up. Sergey corrects a double "link down" message by putting in a check for whether or not the link is up or going down. Björn addresses an AF_XDP zero-copy issue that buffers passed from userspace to the kernel was leaked when the hardware descriptor ring was torn down. A zero-copy capable driver picks buffers off the fill ring and places them on the hardware receive ring to be completed at a later point when DMA is complete. Similar on the transmit side; The driver picks buffers off the transmit ring and places them on the transmit hardware ring. In the typical flow, the receive buffer will be placed onto an receive ring (completed to the user), and the transmit buffer will be placed on the completion ring to notify the user that the transfer is done. However, if the driver needs to tear down the hardware rings for some reason (interface goes down, reconfiguration and such), the userspace buffers cannot be leaked. They have to be reused or completed back to userspace. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-25 20:25:00 -07:00
David S. Miller	7a153655d7	Merge branch 'Refactor-classifier-API-to-work-with-Qdisc-blocks-without-rtnl-lock' Vlad Buslov says: ==================== Refactor classifier API to work with Qdisc/blocks without rtnl lock Currently, all netlink protocol handlers for updating rules, actions and qdiscs are protected with single global rtnl lock which removes any possibility for parallelism. This patch set is a third step to remove rtnl lock dependency from TC rules update path. Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added. Handlers registered with this flag are called without RTNL taken. End goal is to have rule update handlers(RTM_NEWTFILTER, RTM_DELTFILTER, etc.) to be registered with UNLOCKED flag to allow parallel execution. However, there is no intention to completely remove or split rtnl lock itself. This patch set addresses specific problems in implementation of classifiers API that prevent its control path from being executed concurrently. Additional changes are required to refactor classifiers API and individual classifiers for parallel execution. This patch set lays groundwork to eventually register rule update handlers as rtnl-unlocked by modifying code in cls API that works with Qdiscs and blocks. Following patch set does the same for chains and classifiers. The goal of this change is to refactor tcf_block_find() and its dependencies to allow concurrent execution: - Extend Qdisc API with rcu to lookup and take reference to Qdisc without relying on rtnl lock. - Extend tcf_block with atomic reference counting and rcu. - Always take reference to tcf_block while working with it. - Implement tcf_block_release() to release resources obtained by tcf_block_find() - Create infrastructure to allow registering Qdiscs with class ops that do not require the caller to hold rtnl lock. All three netlink rule update handlers use tcf_block_find() to lookup Qdisc and block, and this patch set introduces additional means of synchronization to substitute rtnl lock in cls API. Some functions in cls and sch APIs have historic names that no longer clearly describe their intent. In order not make this code even more confusing when introducing their concurrency-friendly versions, rename these functions to describe actual implementation. Changes from V2 to V3: - Patch 1: - Explicitly include refcount.h in rtnetlink.h. - Patch 3: - Move rcu_head field to the end of struct Qdisc. - Rearrange local variable declarations in qdisc_lookup_rcu(). - Patch 5: - Remove tcf_qdisc_put() and inline its content to callers. Changes from V1 to V2: - Rebase on latest net-next. - Patch 8 - remove. - Patch 9 - fold into patch 11. - Patch 11: - Rename tcf_block_{get\|put}() to tcf_block_refcnt_{get\|put}(). - Patch 13 - remove. ==================== Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-25 20:17:36 -07:00
Vlad Buslov	787ce6d02d	net: sched: use reference counting for tcf blocks on rules update In order to remove dependency on rtnl lock on rules update path, always take reference to block while using it on rules update path. Change tcf_block_get() error handling to properly release block with reference counting, instead of just destroying it, in order to accommodate potential concurrent users. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-25 20:17:36 -07:00
Vlad Buslov	0607e43994	net: sched: implement tcf_block_refcnt_{get\|put}() Implement get/put function for blocks that only take/release the reference and perform deallocation. These functions are intended to be used by unlocked rules update path to always hold reference to block while working with it. They use on new fine-grained locking mechanisms introduced in previous patches in this set, instead of relying on global protection provided by rtnl lock. Extract code that is common with tcf_block_detach_ext() into common function __tcf_block_put(). Extend tcf_block with rcu to allow safe deallocation when it is accessed concurrently. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-25 20:17:36 -07:00
Vlad Buslov	ab2816295f	net: sched: protect block idr with spinlock Protect block idr access with spinlock, instead of relying on rtnl lock. Take tn->idr_lock spinlock during block insertion and removal. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-25 20:17:36 -07:00
Vlad Buslov	f00234367b	net: sched: implement functions to put and flush all chains Extract code that flushes and puts all chains on tcf block to two standalone function to be shared with functions that locklessly get/put reference to block. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-25 20:17:36 -07:00
Vlad Buslov	cfebd7e242	net: sched: change tcf block reference counter type to refcount_t As a preparation for removing rtnl lock dependency from rules update path, change tcf block reference counter type to refcount_t to allow modification by concurrent users. In block put function perform decrement and check reference counter once to accommodate concurrent modification by unlocked users. After this change tcf_chain_put at the end of block put function is called with block->refcnt==0 and will deallocate block after the last chain is released, so there is no need to manually deallocate block in this case. However, if block reference counter reached 0 and there are no chains to release, block must still be deallocated manually. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-25 20:17:36 -07:00
Vlad Buslov	e368fdb61d	net: sched: use Qdisc rcu API instead of relying on rtnl lock As a preparation from removing rtnl lock dependency from rules update path, use Qdisc rcu and reference counting capabilities instead of relying on rtnl lock while working with Qdiscs. Create new tcf_block_release() function, and use it to free resources taken by tcf_block_find(). Currently, this function only releases Qdisc and it is extended in next patches in this series. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-25 20:17:36 -07:00
Vlad Buslov	9d7e82cec3	net: sched: add helper function to take reference to Qdisc Implement function to take reference to Qdisc that relies on rcu read lock instead of rtnl mutex. Function only takes reference to Qdisc if reference counter isn't zero. Intended to be used by unlocked cls API. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-25 20:17:35 -07:00

1 2 3 4 5 ...

783721 Commits All Branches Search

783721 Commits

All Branches