OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Haiyang Zhang	273de02ae7	hv_netvsc: Add handlers for ethtool get/set msg level The handlers for ethtool get/set msg level are missing from netvsc. This patch adds them. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 14:55:00 -04:00
Colin Ian King	7c6f97475d	net: vxge: fix spelling mistake in macro VXGE_HW_ERR_PRIVILAGED_OPEARATION Rename VXGE_HW_ERR_PRIVILAGED_OPEARATION to VXGE_HW_ERR_PRIVILEGED_OPERATION to fix spelling mistake. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 14:50:02 -04:00
David S. Miller	75a839c34b	Merge branch 'udp-gso-fixes' Willem de Bruijn says: ==================== udp gso fixes A few small fixes: - disallow segmentation with XFRM - do not leak gso packets into the ingress path Changes v1 -> v2 - fix build failure in team.c - drop scatter-gather fix: this is now fixed by commit `113f99c335` ("net: test tailroom before appending to linear skb"). After this patch gso skbs are built non-linear regardless of NETIF_F_SG and skb_segment builds linear segs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 14:48:44 -04:00
Willem de Bruijn	8eea1ca82b	gso: limit udp gso to egress-only virtual devices Until the udp receive stack supports large packets (UDP GRO), GSO packets must not loop from the egress to the ingress path. Revert the change that added NETIF_F_GSO_UDP_L4 to various virtual devices through NETIF_F_GSO_ENCAP_ALL as this included devices that may loop packets, such as veth and macvlan. Instead add it to specific devices that forward to another device's egress path, bonding and team. Fixes: `83aa025f53` ("udp: add gso support to virtual devices") CC: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 14:48:44 -04:00
Willem de Bruijn	ff06342cbc	udp: exclude gso from xfrm paths UDP GSO delays final datagram construction to the GSO layer. This conflicts with protocol transformations. Fixes: `bec1f6f697` ("udp: generate gso with UDP_SEGMENT") CC: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 14:48:44 -04:00
David S. Miller	e89e59c08d	Merge branch 'net-sfp-small-improvements' Antoine Tenart says: ==================== net: sfp: small improvements A small series of patches improving the SFP support by adding a warning when no Tx disable pin is available, and making the i2c-bus property mandatory. Thanks! Antoine Since v1: - Removed the patch fixing the sfp driver when no i2c bus was described. - Made two new patches to make the i2c-bus property mandatory for sfp modules. Since the phylink series: - s/-EOPNOTSUPP/-ENODEV/ in patch 1/2. - I added the acked-by tag in patch 2/2. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 14:34:28 -04:00
Antoine Tenart	3e48439347	Documentation/bindings: net: the sfp i2c-bus property is now mandatory The i2c-bus property for sfp modules was made mandatory. Update the documentation to keep it in sync with the driver's behaviour. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 14:34:27 -04:00
Antoine Tenart	66ede1f9c9	net: phy: sfp: make the i2c-bus dt property mandatory This patch makes the i2c-bus property mandatory when using a device tree. If the sfp i2c bus isn't described it's impossible to guess the protocol to use for a given module, and the sfp module would then not work in most cases. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 14:34:27 -04:00
Antoine Tenart	a1f5d1f0df	net: phy: sfp: warn the user when no tx_disable pin is available In case no Tx disable pin is available the SFP modules will always be emitting. This could be an issue when using modules using laser as their light source as we would have no way to disable it when the fiber is removed. This patch adds a warning when registering an SFP cage which do not have its tx_disable pin wired or available. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 14:34:27 -04:00
David S. Miller	47de868bd6	Merge branch 'nfp-abm-add-basic-support-for-advanced-buffering-NIC' Jakub Kicinski says: ==================== nfp: abm: add basic support for advanced buffering NIC This series lays groundwork for advanced buffer management NIC feature. It makes necessary NFP core changes, spawns representors and adds devlink glue. Following series will add the actual buffering configuration (patch series size limit). First three patches add support for configuring NFP buffer pools via a mailbox. The existing devlink APIs are used for the purpose. Third patch allows us to perform small reads from the NFP memory. The rest of the patch set adds eswitch mode change support and makes the driver spawn appropriate representors. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 14:26:20 -04:00
Jakub Kicinski	51c1df83e3	nfp: assign vNIC id as phys_port_name of vNICs which are not ports When NFP is modelled as a switch we assign phys_port_name to respective port(representor )s: vNIC0 - \| - PF port (pf%d) MAC/PHY (p%d[s%d]) - \|E== In most cases there is only one vNIC for communication with the switch. If there is more than one we need to be able to identify them. Use %d as phys_port_name of the vNICs. We don't have to pass ID to nfp_net_debugfs_vnic_add() separately any more. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 14:26:19 -04:00
Jakub Kicinski	290f54db31	nfp: use split in naming of PCIe PF ports PCI PFs can host more than one logical endpoint. In NFP terms this means having more than one vNIC for PCIe PF. The vNICs are usually corresponding 1:1 to Ethernet ports. In core NIC we use the legacy idea of vNIC being the Ethernet port, hence netdevs put pX(sY) in their phys_port_name, like Ethernet ports would. When ASIC ports are fully represented we need to be able to name different PCIe PF ports, too. Use a scheme similar to Ethernet ports - pfXsY, for PCIe PF number X, sub-port Y. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 14:26:19 -04:00
Jakub Kicinski	1f70036723	nfp: abm: force Ethternet port up Current control firmware does not cater too well to multi-host applications. There is no way to check which hosts are up or otherwise negotiate what the state of the external port (the Ethernet port) should be. Make sure the link is up when driver loads, and don't take it down when Ethernet port netdev is closed. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 14:26:19 -04:00
Jakub Kicinski	d05d902eda	nfp: abm: spawn port netdevs To configure buffering points we need full set of netdevs: ASIC user netdev -- \| -- PCIe port MAC port -- \| -- Configuring egrees qdiscs on user netdev configures standard Linux TC software qdiscs, configuring PCIe port qdiscs will provide a way of setting ASIC queuing parameters for PCIe block. MAC port netdev egress qdiscs correspond to ASIC MAC Traffic Manager block. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 14:26:19 -04:00
Jakub Kicinski	4afa3af418	nfp: add devlink_eswitch_mode_set callback Our previous apps all assumed to use only one eswitch mode (legacy or switchdev) without the ability to change it. ABM NIC will want to support the switch so plumb devlink_eswitch_mode_set through. The devlink_eswitch_mode_set is expected to spawn representors and potentially devlink ports so it's called under big devlink lock and pf->lock. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 14:26:19 -04:00
Jakub Kicinski	7ac1cc9aef	devlink: don't take instance lock around eswitch mode set Changing switch mode may want to register and unregister devlink ports. Therefore similarly to DEVLINK_CMD_PORT_SPLIT/UNSPLIT it should not take the instance lock. Drivers don't depend on existing locking since it's a very recent addition. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 14:26:19 -04:00
Jakub Kicinski	634c6b7a85	nfp: add app pointer to port representors nfp_apps can currently associate their structures with vNICs but not representors. Add app priv pointer to representors as well. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 14:26:18 -04:00
Jakub Kicinski	cc54dc2804	nfp: abm: create project-specific vNIC structure ABM NIC requires more complex vNIC handling, allocate per-vNIC structure. Find out RX queue base and PCI PF id. There will be multiple PFs sharing the same MAC port, therefore the MAC address assigned to the vNIC must be looked up in the HWInfo database. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 14:26:18 -04:00
Jakub Kicinski	c4c8f39a57	nfp: abm: add initial active buffer management NIC skeleton Add a very rudimentary active buffer management NIC support. For now it's like a core NIC without SR-IOV support. Next commits will extend its functionality. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 14:26:18 -04:00
Jakub Kicinski	b586c77b3c	nfp: core: allow 4-byte aligned accesses to Memory Units Current code doesn't enforce length requirements on 32bit accesses with action NFP_CPP_ACTION_RW to memory units, but if the access is only aligned to 4 bytes as well we will fall into the explicit access case and error out. Such accesses are correct, allow them by lowering the width earlier. While at it use a switch statement to improve readability. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 14:26:18 -04:00
Jakub Kicinski	a0d163f432	nfp: add shared buffer configuration Allow app FW to advertise its shared buffer pool information. Use the per-PF mailbox to configure them from devlink. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 14:26:18 -04:00
Jakub Kicinski	0c693323a1	nfp: add support for per-PCI PF mailbox When working with devlink-related functionality for locking reasons it's easier to create a new mailbox per-PCI PF device than try to use one of the netdev/vNIC mailboxes. Define new mailbox structure and resolve its symbol during probe. For forward compatibility allow silent truncation of mailbox command data. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 14:26:18 -04:00
Jakub Kicinski	8f6196f63c	nfp: move rtsym helpers to pf code nfp_net_pf_rtsym_read_optional() and nfp_net_pf_map_rtsym() are not really related to networking code. Move them to the PF code and remove the net from their names. They will soon be needed by code outside of nfp_net_main.c anyway. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 14:26:18 -04:00
David S. Miller	e95a5f5489	Merge branch 'bpfilter' Alexei Starovoitov says: ==================== bpfilter v2->v3: - followed Luis's suggestion and significantly simplied first patch with shmem_kernel_file_setup+kernel_write. Added kdoc for new helper - fixed typos and race to access pipes with mutex - tested with bpfilter being 'builtin'. CONFIG_BPFILTER_UMH=y\|m both work. Interesting to see a usermode executable being embedded inside vmlinux. - it doesn't hurt to enable bpfilter in .config. ip_setsockopt commands sent to usermode via pipes and -ENOPROTOOPT is returned from userspace, so kernel falls back to original iptables code v1->v2: this patch set is almost a full rewrite of the earlier umh modules approach The v1 of patches and follow up discussion was covered by LWN: https://lwn.net/Articles/749108/ I believe the v2 addresses all issues brought up by Andy and others. Mainly there are zero changes to kernel/module.c Instead of teaching module loading logic to recognize special umh module, let normal kernel modules execute part of its own .init.rodata as a new user space process (Andy's idea) Patch 1 introduces this new helper: int fork_usermode_blob(void data, size_t len, struct umh_info info); Input: data + len == executable file Output: struct umh_info { struct file pipe_to_umh; struct file pipe_from_umh; pid_t pid; }; Advantages vs v1: - the embedded user mode executable is stored as .init.rodata inside normal kernel module. These pages are freed when .ko finishes loading - the elf file is copied into tmpfs file. The user mode process is swappable. - the communication between user mode process and 'parent' kernel module is done via two unix pipes, hence protocol is not exposed to user space - impossible to launch umh on its own (that was the main issue of v1) and impossible to be man-in-the-middle due to pipes - bpfilter.ko consists of tiny kernel part that passes the data between kernel and umh via pipes and much bigger umh part that doing all the work - 'lsmod' shows bpfilter.ko as usual. 'rmmod bpfilter' removes kernel module and kills corresponding umh - signed bpfilter.ko covers the whole image including umh code Few issues: - the user can still attach to the process and debug it with 'gdb /proc/pid/exe pid', but 'gdb -p pid' doesn't work. (a bit worse comparing to v1) - tinyconfig will notice a small increase in .text +766 \| TEXT \| 7c8b94806bec umh: introduce fork_usermode_blob() helper ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 13:23:40 -04:00
Alexei Starovoitov	d2ba09c17a	net: add skeleton of bpfilter kernel module bpfilter.ko consists of bpfilter_kern.c (normal kernel module code) and user mode helper code that is embedded into bpfilter.ko The steps to build bpfilter.ko are the following: - main.c is compiled by HOSTCC into the bpfilter_umh elf executable file - with quite a bit of objcopy and Makefile magic the bpfilter_umh elf file is converted into bpfilter_umh.o object file with _binary_net_bpfilter_bpfilter_umh_start and _end symbols Example: $ nm ./bld_x64/net/bpfilter/bpfilter_umh.o 0000000000004cf8 T _binary_net_bpfilter_bpfilter_umh_end 0000000000004cf8 A _binary_net_bpfilter_bpfilter_umh_size 0000000000000000 T _binary_net_bpfilter_bpfilter_umh_start - bpfilter_umh.o and bpfilter_kern.o are linked together into bpfilter.ko bpfilter_kern.c is a normal kernel module code that calls the fork_usermode_blob() helper to execute part of its own data as a user mode process. Notice that _binary_net_bpfilter_bpfilter_umh_start - end is placed into .init.rodata section, so it's freed as soon as __init function of bpfilter.ko is finished. As part of __init the bpfilter.ko does first request/reply action via two unix pipe provided by fork_usermode_blob() helper to make sure that umh is healthy. If not it will kill it via pid. Later bpfilter_process_sockopt() will be called from bpfilter hooks in get/setsockopt() to pass iptable commands into umh via bpfilter.ko If admin does 'rmmod bpfilter' the __exit code bpfilter.ko will kill umh as well. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 13:23:40 -04:00
Alexei Starovoitov	449325b52b	umh: introduce fork_usermode_blob() helper Introduce helper: int fork_usermode_blob(void data, size_t len, struct umh_info info); struct umh_info { struct file pipe_to_umh; struct file pipe_from_umh; pid_t pid; }; that GPLed kernel modules (signed or unsigned) can use it to execute part of its own data as swappable user mode process. The kernel will do: - allocate a unique file in tmpfs - populate that file with [data, data + len] bytes - user-mode-helper code will do_execve that file and, before the process starts, the kernel will create two unix pipes for bidirectional communication between kernel module and umh - close tmpfs file, effectively deleting it - the fork_usermode_blob will return zero on success and populate 'struct umh_info' with two unix pipes and the pid of the user process As the first step in the development of the bpfilter project the fork_usermode_blob() helper is introduced to allow user mode code to be invoked from a kernel module. The idea is that user mode code plus normal kernel module code are built as part of the kernel build and installed as traditional kernel module into distro specified location, such that from a distribution point of view, there is no difference between regular kernel modules and kernel modules + umh code. Such modules can be signed, modprobed, rmmod, etc. The use of this new helper by a kernel module doesn't make it any special from kernel and user space tooling point of view. Such approach enables kernel to delegate functionality traditionally done by the kernel modules into the user space processes (either root or !root) and reduces security attack surface of the new code. The buggy umh code would crash the user process, but not the kernel. Another advantage is that umh code of the kernel module can be debugged and tested out of user space (e.g. opening the possibility to run clang sanitizers, fuzzers or user space test suites on the umh code). In case of the bpfilter project such architecture allows complex control plane to be done in the user space while bpf based data plane stays in the kernel. Since umh can crash, can be oom-ed by the kernel, killed by the admin, the kernel module that uses them (like bpfilter) needs to manage life time of umh on its own via two unix pipes and the pid of umh. The exit code of such kernel module should kill the umh it started, so that rmmod of the kernel module will cleanup the corresponding umh. Just like if the kernel module does kmalloc() it should kfree() it in the exit code. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-23 13:23:39 -04:00
David S. Miller	1fe8c06c4a	Merge branch 'qed-firmware-TLV' Sudarsana Reddy Kalluru says: ==================== qed*: Add support for management firmware TLV request. Management firmware (MFW) requires config and state information from the driver. It queries this via TLV (type-length-value) request wherein mfw specificies the list of required TLVs. Driver fills the TLV data and responds back to MFW. This patch series adds qed/qede/qedf/qedi driver implementation for supporting the TLV queries from MFW. Changes from previous versions: ------------------------------- v2: Split patch (2) into multiple simpler patches. v2: Update qed_tlv_parsed_buf->p_val datatype to void pointer to avoid bunch of unnecessary typecasts. Please consider applying this series to "net-next". ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 23:29:55 -04:00
Manish Rangankar	269afb3603	qedi: Add get_generic_tlv_data handler. Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 23:29:54 -04:00
Manish Rangankar	534bbdf883	qedi: Add support for populating ethernet TLVs. This patch adds callbacks for providing the ethernet protocol driver TLVs. Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 23:29:54 -04:00
Chad Dupuis	8673daf4f5	qedf: Add get_generic_tlv_data handler. Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 23:29:54 -04:00
Chad Dupuis	642a0b37e6	qedf: Add support for populating ethernet TLVs. This patch adds callbacks for providing the ethernet protocol driver TLVs. Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 23:29:54 -04:00
Sudarsana Reddy Kalluru	d25b859ccd	qede: Add support for populating ethernet TLVs. This patch adds callbacks for providing the ethernet protocol driver TLVs. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 23:29:54 -04:00
Sudarsana Reddy Kalluru	59ccf86fe6	qed: Add driver infrastucture for handling mfw requests. MFW requests the TLVs in interrupt context. Extracting of the required data from upper layers and populating of the TLVs require process context. The patch adds work-queues for processing the tlv requests. It also adds the implementation for requesting the tlv values from appropriate protocol driver. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 23:29:54 -04:00
Sudarsana Reddy Kalluru	77a509e4f6	qed: Add support for processing iscsi tlv request. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 23:29:54 -04:00
Sudarsana Reddy Kalluru	f240b68822	qed: Add support for processing fcoe tlv request. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 23:29:53 -04:00
Sudarsana Reddy Kalluru	2528c38993	qed: Add support for tlv request processing. The patch adds driver support for processing TLV requests/repsonses from the mfw and upper driver layers respectively. The implementation reads the requested TLVs from the shared memory, requests the values from upper layer drivers, populates this info (TLVs) shared memory and notifies MFW about the TLV values. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 23:29:53 -04:00
Sudarsana Reddy Kalluru	dd006921d6	qed: Add MFW interfaces for TLV request support. The patch adds required management firmware (MFW) interfaces such as mailbox commands, TLV types etc. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 23:29:53 -04:00
David S. Miller	9c803cfd5f	Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2018-05-22 This series contains updates to i40e only. Jake provides all the changes in this series starting with making it consistent in how we approach the bit lock. Fixed the reporting of the VEB statistics and the queue statistics to always return every queue even if it is not currently in use. Use WARN_ONCE() so that the first time we end up with an incorrect size we will dump a stack trace and a message to help highlight the issue early in testing. Folded the fixed string prefix into the stat string definition. Instead of using a separate char *p pointer when copying strings, use the data pointer directly. Added code comments for several of the statistic functions to better explain the number and ordering of statistics. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 15:45:06 -04:00
David S. Miller	119768c903	Merge branch 'tcp-ECN-quickack' Eric Dumazet says: ==================== tcp: reduce quickack pressure for ECN Small patch series changing TCP behavior vs quickack and ECN First patch is a refactoring, adding parameter to tcp_incr_quickack() and tcp_enter_quickack_mode() helpers. Second patch implements the change, lowering number of ACK packets sent after an ECN event. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 15:43:16 -04:00
Eric Dumazet	522040ea5f	tcp: do not aggressively quick ack after ECN events ECN signals currently forces TCP to enter quickack mode for up to 16 (TCP_MAX_QUICKACKS) following incoming packets. We believe this is not needed, and only sending one immediate ack for the current packet should be enough. This should reduce the extra load noticed in DCTCP environments, after congestion events. This is part 2 of our effort to reduce pure ACK packets. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 15:43:15 -04:00
Eric Dumazet	9a9c9b51e5	tcp: add max_quickacks param to tcp_incr_quickack and tcp_enter_quickack_mode We want to add finer control of the number of ACK packets sent after ECN events. This patch is not changing current behavior, it only enables following change. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 15:43:15 -04:00
Vlad Buslov	290aa0ad74	net: sched: don't disable bh when accessing action idr Initial net_device implementation used ingress_lock spinlock to synchronize ingress path of device. This lock was used in both process and bh context. In some code paths action map lock was obtained while holding ingress_lock. Commit `e1e992e52f` ("[NET_SCHED] protect action config/dump from irqs") modified actions to always disable bh, while using action map lock, in order to prevent deadlock on ingress_lock in softirq. This lock was removed from net_device, so disabling bh, while accessing action map, is no longer necessary. Replace all action idr spinlock usage with regular calls that do not disable bh. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 15:34:34 -04:00
David S. Miller	73bf1fc58d	Merge branch 'net-ipv6-Fix-route-append-and-replace-use-cases' David Ahern says: ==================== net/ipv6: Fix route append and replace use cases This patch set fixes a few append and replace uses cases for IPv6 and adds test cases that codifies the expectations of how append and replace are expected to work. In paricular it allows a multipath route to have a dev-only nexthop, something Thomas tried to accomplish with commit `edd7ceb782` ("ipv6: Allow non-gateway ECMP for IPv6") which had to be reverted because of breakage, and to replace an existing FIB entry with a reject route. There are a number of inconsistent and surprising aspects to the Linux API for adding, deleting, replacing and changing FIB entries. For example, with IPv4 NLM_F_APPEND means insert the route after any existing entries with the same key (prefix + priority + TOS for IPv4) and NLM_F_CREATE without the append flag inserts the new route before any existing entries. IPv6 on the other hand attempts to guess whether a new route should be appended to an existing one, possibly creating a multipath route, or to add a new entry after any existing ones. This applies to both the 'append' (NLM_F_CREATE + NLM_F_APPEND) and 'prepend' (NLM_F_CREATE only) cases meaning for IPv6 the NLM_F_APPEND is basically ignored. This guessing whether the route should be added to a multipath route (gateway routes) or inserted after existing entries (non-gateway based routes) means a multipath route can not have a dev only nexthop (potentially required in some cases - tunnels or VRF route leaking for example) and route 'replace' is a bit adhoc treating gateway based routes and dev-only / reject routes differently. This has led to frustration with developers working on routing suites such as FRR where workarounds such as delete and add are used instead of replace. After this patch set there are 2 differences between IPv4 and IPv6: 1. 'ip ro prepend' = NLM_F_CREATE only IPv4 adds the new route before any existing ones IPv6 adds new route after any existing ones 2. 'ip ro append' = NLM_F_CREATE\|NLM_F_APPEND IPv4 adds the new route after any existing ones IPv6 adds the nexthop to existing routes converting to multipath For the former, there are cases where we want same prefix routes added after existing ones (e.g., multicast, prefix routes for macvlan when used for virtual router redundancy). Requiring the APPEND flag to add a new route to an existing one helps here but is a slight change in behavior since prepend with gateway routes now create a separate entry. For the latter IPv6 behavior is preferred - appending a route for the same prefix and metric to make a multipath route, so really IPv4 not allowing an existing route to be updated is the limiter. This will be fixed when nexthops become separate objects - a future patch set. Thank you to Thomas and Ido for testing earlier versions of this set, and to Ido for providing an update to the mlxsw driver. Changes since RFC - cleanup wording in test script; add comments about expected failures and why ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 14:44:20 -04:00
David Ahern	abb1860aac	selftests: fib_tests: Add ipv4 route add append replace tests Add IPv4 route tests covering add, append and replace permutations. Assumes the ability to add a basic single path route works; this is required for example when adding an address to an interface. $ fib_tests.sh -t ipv4_rt IPv4 route add / append tests TEST: Attempt to add duplicate route - gw [ OK ] TEST: Attempt to add duplicate route - dev only [ OK ] TEST: Attempt to add duplicate route - reject route [ OK ] TEST: Add new nexthop for existing prefix [ OK ] TEST: Append nexthop to existing route - gw [ OK ] TEST: Append nexthop to existing route - dev only [ OK ] TEST: Append nexthop to existing route - reject route [ OK ] TEST: Append nexthop to existing reject route - gw [ OK ] TEST: Append nexthop to existing reject route - dev only [ OK ] TEST: add multipath route [ OK ] TEST: Attempt to add duplicate multipath route [ OK ] TEST: Route add with different metrics [ OK ] TEST: Route delete with metric [ OK ] IPv4 route replace tests TEST: Single path with single path [ OK ] TEST: Single path with multipath [ OK ] TEST: Single path with reject route [ OK ] TEST: Single path with single path via multipath attribute [ OK ] TEST: Invalid nexthop [ OK ] TEST: Single path - replace of non-existent route [ OK ] TEST: Multipath with multipath [ OK ] TEST: Multipath with single path [ OK ] TEST: Multipath with single path via multipath attribute [ OK ] TEST: Multipath with reject route [ OK ] TEST: Multipath - invalid first nexthop [ OK ] TEST: Multipath - invalid second nexthop [ OK ] TEST: Multipath - replace of non-existent route [ OK ] Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 14:44:19 -04:00
David Ahern	f9a5a9d89f	selftests: fib_tests: Add ipv6 route add append replace tests Add IPv6 route tests covering add, append and replace permutations. Assumes the ability to add a basic single path route works; this is required for example when adding an address to an interface. $ fib_tests.sh -t ipv6_rt IPv6 route add / append tests TEST: Attempt to add duplicate route - gw [ OK ] TEST: Attempt to add duplicate route - dev only [ OK ] TEST: Attempt to add duplicate route - reject route [ OK ] TEST: Add new route for existing prefix (w/o NLM_F_EXCL) [ OK ] TEST: Append nexthop to existing route - gw [ OK ] TEST: Append nexthop to existing route - dev only [ OK ] TEST: Append nexthop to existing route - reject route [ OK ] TEST: Append nexthop to existing reject route - gw [ OK ] TEST: Append nexthop to existing reject route - dev only [ OK ] TEST: Add multipath route [ OK ] TEST: Attempt to add duplicate multipath route [ OK ] TEST: Route add with different metrics [ OK ] TEST: Route delete with metric [ OK ] IPv6 route replace tests TEST: Single path with single path [ OK ] TEST: Single path with multipath [ OK ] TEST: Single path with reject route [ OK ] TEST: Single path with single path via multipath attribute [ OK ] TEST: Invalid nexthop [ OK ] TEST: Single path - replace of non-existent route [ OK ] TEST: Multipath with multipath [ OK ] TEST: Multipath with single path [ OK ] TEST: Multipath with single path via multipath attribute [ OK ] TEST: Multipath with reject route [ OK ] TEST: Multipath - invalid first nexthop [ OK ] TEST: Multipath - invalid second nexthop [ OK ] TEST: Multipath - replace of non-existent route [ OK ] Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 14:44:19 -04:00
David Ahern	7df15e6c3e	selftests: fib_tests: Add option to pause after each test Add option to pause after each test before cleanup is done. Allows user to do manual inspection or more ad-hoc testing after each test with the setup in tact. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 14:44:19 -04:00
David Ahern	1c7447b4e8	selftests: fib_tests: Add command line options Add command line options for controlling pause on fail, controlling specific tests to run and verbose mode rather than relying on environment variables. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 14:44:19 -04:00
David Ahern	37ce42c14e	selftests: fib_tests: Add success-fail counts As more tests are added, it is convenient to have a tally at the end. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 14:44:18 -04:00
David Ahern	f34436a430	net/ipv6: Simplify route replace and appending into multipath route Bring consistency to ipv6 route replace and append semantics. Remove rt6_qualify_for_ecmp which is just guess work. It fails in 2 cases: 1. can not replace a route with a reject route. Existing code appends a new route instead of replacing the existing one. 2. can not have a multipath route where a leg uses a dev only nexthop Existing use cases affected by this change: 1. adding a route with existing prefix and metric using NLM_F_CREATE without NLM_F_APPEND or NLM_F_EXCL (ie., what iproute2 calls 'prepend'). Existing code auto-determines that the new nexthop can be appended to an existing route to create a multipath route. This change breaks that by requiring the APPEND flag for the new route to be added to an existing one. Instead the prepend just adds another route entry. 2. route replace. Existing code replaces first matching multipath route if new route is multipath capable and fallback to first matching non-ECMP route (reject or dev only route) in case one isn't available. New behavior replaces first matching route. (Thanks to Ido for spotting this one) Note: Newer iproute2 is needed to display multipath routes with a dev-only nexthop. This is due to a bug in iproute2 and parsing nexthops. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 14:44:18 -04:00
David Ahern	5a15a1b07c	mlxsw: spectrum_router: Add support for route append Handle append for gateway based routes. Dev-only multipath routes will be handled by a follow on patch. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 14:44:18 -04:00

1 2 3 4 5 ...

755105 Commits All Branches Search

755105 Commits

All Branches