linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Mike Maloney	472ecf084a	selftests/net: Fix broken test case in psock_fanout The error return falue form sock_fanout_open is -1, not zero. One test case was checking for 0 instead of -1. Tested: Built and tested in clean client. Signed-off-by: Mike Maloney <maloney@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-25 11:56:17 -04:00
Ivan Khoronzhuk	799dbe3e1c	net: ethernet: ti: netcp_core: remove unused compl queue mapping This code is unused and probably was unintentionally left while moving completion queue mapping in submit function. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-25 11:54:47 -04:00
David S. Miller	c1a9f80e04	Merge branch 'qed-vf-tunnel' Manish Chopra says: ==================== qed/qede: VF tunnelling support With this series VFs can run vxlan/geneve/gre tunnels over it. Please consider applying this series to "net-next" ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-25 11:49:33 -04:00
Chopra, Manish	eaf3c0c6b4	qed - VF tunnelling support [VXLAN/GENEVE/GRE] This patch adds hardware channel APIs support between VF and PF for tunnelling configuration for the VFs. According to that configuration VFs can run VXLAN/GENEVE/GRE tunnels over it with tunnel features offloaded. Using these APIs VF can also request for UDP ports configuration to the PF, although PF and it's child VFs share the same port. Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-25 11:49:32 -04:00
Chopra, Manish	97379f15c2	qed/qede: Add UDP ports in bulletin board This patch adds support for UDP ports in bulletin board to notify UDP ports change to the VFs Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-25 11:49:31 -04:00
Chopra, Manish	327a2b750c	qede: Configure UDP ports in local context. This patch configures UDP ports locally instead of configuring them in deferred context which would be helpful in synchronizing UDP ports configuration for VFs which will be enabled in further patches. Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-25 11:49:31 -04:00
Chopra, Manish	369bfd4ec7	qede: Disable tunnel offloads for non offloaded UDP ports This patch disables tunnel offloads via ndo_features_check() if given UDP port is not offloaded to hardware. This in turn allows to run multiple tunnel interfaces using different UDP ports. Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-25 11:49:31 -04:00
Chopra, Manish	19489c7f0d	qed/qede: Enable tunnel offloads based on hw configuration This patch enables tunnel feature offloads based on hw configuration at initialization time instead of enabling them always. Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-25 11:49:31 -04:00
Chopra, Manish	1996843012	qed: refactor tunnelling - API/Structs This patch changes the tunnel APIs to use per tunnel info instead of using bitmasks for all tunnels and also uses single struct to hold the data to prepare multiple variant of tunnel configuration ramrods to be sent to the hardware. Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-25 11:49:30 -04:00
David S. Miller	36784277c5	Merge branch 'l2tpeth-info' Guillaume Nault says: ==================== l2tp: add informations about l2tpeth interfaces in /sys Patch #1 lets userspace retrieve the naming scheme of an l2tpeth interface, using /sys/class/net/<iface>/name_assign_type. Patch #2 adds the DEVTYPE field in /sys/class/net/<iface>/uevent so that userspace can reliably know if a device is an l2tpeth interface. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-25 11:41:57 -04:00
Guillaume Nault	a485c2b877	l2tp: define "l2tpeth" device type Export type of l2tpeth interfaces to userspace (/sys/class/net/<iface>/uevent). Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Acked-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-25 11:41:56 -04:00
Guillaume Nault	c39855febc	l2tp: set name_assign_type for devices created by l2tp_eth.c Export naming scheme used when creating l2tpeth interfaces (/sys/class/net/<iface>/name_assign_type). This let userspace know if the device's name has been generated automatically or defined manually. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Acked-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-25 11:41:56 -04:00
Jamal Hadi Salim	e0ee84ded7	net sched actions: Complete the JUMPX opcode per discussion at netconf/netdev: When we have an action that is capable of branching (example a policer), we can achieve a continuation of the action graph by programming a "continue" where we find an exact replica of the same filter rule with a lower priority and the remainder of the action graph. When you have 100s of thousands of filters which require such a feature it gets very inefficient to do two lookups. This patch completes a leftover feature of action codes. Its time has come. Example below where a user labels packets with a different skbmark on ingress of a port depending on whether they have/not exceeded the configured rate. This mark is then used to make further decisions on some egress port. #rate control, very low so we can easily see the effect sudo $TC actions add action police rate 1kbit burst 90k \ conform-exceed pipe/jump 2 index 10 # skbedit index 11 will be used if the user conforms sudo $TC actions add action skbedit mark 11 ok index 11 # skbedit index 12 will be used if the user does not conform sudo $TC actions add action skbedit mark 12 ok index 12 #lets bind the user .. sudo $TC filter add dev $ETH parent ffff: protocol ip prio 8 u32 \ match ip dst 127.0.0.8/32 flowid 1:10 \ action police index 10 \ action skbedit index 11 \ action skbedit index 12 #run a ping -f and see what happens.. # jhs@foobar:~$ sudo $TC -s filter ls dev $ETH parent ffff: protocol ip filter pref 8 u32 filter pref 8 u32 fh 800: ht divisor 1 filter pref 8 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 (rule hit 2800 success 1005) match 7f000008/ffffffff at 16 (success 1005 ) action order 1: police 0xa rate 1Kbit burst 23440b mtu 2Kb action pipe/jump 2 overhead 0b ref 2 bind 1 installed 207 sec used 122 sec Action statistics: Sent 84420 bytes 1005 pkt (dropped 0, overlimits 721 requeues 0) backlog 0b 0p requeues 0 action order 2: skbedit mark 11 pass index 11 ref 2 bind 1 installed 204 sec used 122 sec Action statistics: Sent 60564 bytes 721 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 action order 3: skbedit mark 12 pass index 12 ref 2 bind 1 installed 201 sec used 122 sec Action statistics: Sent 23856 bytes 284 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Not bad, about 28% non-conforming packets.. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-25 11:30:06 -04:00
David S. Miller	45a6f3bca6	linux-can-next-for-4.12-20170425 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEE4bay/IylYqM/npjQHv7KIOw4HPYFAlj/Cw8THG1rbEBwZW5n dXRyb25peC5kZQAKCRAe/sog7Dgc9keOB/9dFFKUSqXEbevRCVj8Hc/tpmUnAYDP xcIpz/8GRHPrkOx/tpqtzkAQjeiNzcrT5LDPLDoMSpubZDJjNTKGfcb5sOvGqK9P IyY4dv0DO9/z1zxdpkK7CkR+g9Z3w9mEdQl2OS0yxbOXRPgX5Sl44Tp5xWgvJhOc s60m/Y60PQ22CSee7EBYWCwvJPfLIdsr5AIM6wtbEveZU13afAFbyIqoP/97RZKF sJ8NfGwQmRcD+AHw1nB/YfhNh4NEBE52IiBKf3zgC8Y8lDId/Wve/j/MnJGzeK48 eXPU3g1QaJTuEddn4xC0RRMKycR9klwfGJkY1cKSLtCWQxd1cfHKLdWY =P474 -----END PGP SIGNATURE----- Merge tag 'linux-can-next-for-4.12-20170425' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2017-04-25 this is a pull request of 21 patches for net-next/master. There are 4 patches by Stephane Grosjean for the PEAK PCAN-PCIe FD CAN-FD boards. The next 7 patches are by Mario Huettel, which add support for M_CAN IP version >= v3.1.x to the m_can driver. A patch by Remigiusz Kołłątaj adds support for the Microchip CAN BUS Analyzer. 8 patches by Oliver Hartkopp complete the initial CAN network namespace support. Wei Yongjun's patch for the ti_hecc driver fixes the return value check in the probe function. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-25 10:52:47 -04:00
Florian Westphal	3133822f5a	ipvlan: use pernet operations and restrict l3s hooks to master netns commit `4fbae7d83c` ("ipvlan: Introduce l3s mode") added registration of netfilter hooks via nf_register_hooks(). This API provides the illusion of 'global' netfilter hooks by placing the hooks in all current and future network namespaces. In case of ipvlan the hook appears to be only needed in the namespace that contains the ipvlan master device (i.e., usually init_net), so placing them in all namespaces is not needed. This switches ipvlan driver to pernet operations, and then only registers hooks in namespaces where a ipvlan master device is set to l3s mode. Extra care has to be taken when the master device is moved to another namespace, as we might have to 'move' the netfilter hooks too. This is done by storing the namespace the ipvlan port was created in. On REGISTER event, do (un)register operations in the old/new namespaces. This will also allow removal of the nf_register_hooks() in a future patch. Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-25 10:43:22 -04:00
Wei Yongjun	b655f0e96d	can: ti_hecc: fix return value check in ti_hecc_probe() In case of error, the function devm_ioremap_resource() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Fixes: `dabf54dd1c` ("can: ti_hecc: Convert TI HECC driver to DT only driver") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 10:03:40 +02:00
Oliver Hartkopp	5e64ebc1c2	can: enable module auto loading for virtual CAN interfaces Autoload the vcan module when a vcan instance is to be created by 'ip link add type vcan' Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:04:31 +02:00
Oliver Hartkopp	a8f820a380	can: add Virtual CAN Tunnel driver (vxcan) Similar to the virtual ethernet driver veth, vxcan implements a local CAN traffic tunnel between two virtual CAN network devices. See Kconfig entry for details. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:04:30 +02:00
Oliver Hartkopp	1ef83310b8	can: network namespace support for CAN gateway The CAN gateway was not implemented as per-net in the initial network namespace support by Mario Kicherer (`8e8cda6d73`). This patch enables the CAN gateway to be used in different namespaces. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:04:30 +02:00
Oliver Hartkopp	384317ef41	can: network namespace support for CAN_BCM protocol The CAN_BCM protocol and its procfs entries were not implemented as per-net in the initial network namespace support by Mario Kicherer (`8e8cda6d73`). This patch adds the missing per-net functionality for the CAN BCM. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:04:29 +02:00
Oliver Hartkopp	cb5635a367	can: complete initial namespace support The statistics and its proc output was not implemented as per-net in the initial network namespace support by Mario Kicherer (`8e8cda6d73`). This patch adds the missing per-net statistics for the CAN subsystem. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:04:29 +02:00
Oliver Hartkopp	f2e72f43e7	can: remove obsolete definitions can_rx_alldev_list is a per-net data structure now. Remove it's definition here and can_rx_dev_list too. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:04:28 +02:00
Oliver Hartkopp	48452c169d	can: remove obsolete pernet_operations definitions The namespace support for the CAN subsystem does not need any additional memory. So when ".size = 0" there's no extra memory allocated by the system. And therefore ".id" is obsolete too. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:04:28 +02:00
Oliver Hartkopp	a7bbd28f04	can: fix memory leak in initial namespace support The can_rx_alldev_list is a per-net data structure now and allocated in can_pernet_init(). Make sure the memory is free'd in can_pernet_exit() too. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:04:27 +02:00
Remigiusz Kołłątaj	51f3baad7d	can: mcba_usb: Add support for Microchip CAN BUS Analyzer SocketCAN driver for Microchip CAN BUS Analyzer (http://www.microchip.com/development-tools/) Changes in v4: - possible memory leak fixed in mcba_usb_write_bulk_callback - LED support added - failure handling in mcba_usb_probe improved - C99 initializers for structs on stack Changes in v3: - improved/simplified CAN ID conversion - functions for transmission of skb and cmd separated - fixed/improved netif_stop_queue handling - style/cosmetic corrections Changes in v2: - Termination handling reimplemented to fit new netlink API (IFLA_CAN_TERMINATION) - Bitrate handling reimplemented to fit new netlink API (IFLA_CAN_BITRATE) - CAN ID conversion refactored (changed from macro to inline functions) - CAN DLC handling using get_can_dlc() - Endianness handling for can_speed introduced - Debugging removed - Redundant error prints removed - Style/cosmetic corrections (i.e. macro names, redefs, inits etc.) Signed-off-by: Remigiusz Kołłątaj <remigiusz.kollataj@mobica.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:49 +02:00
Mario Huettel	10c1c3975a	can: m_can: Enable TX FIFO Handling for M_CAN IP version >= v3.1.x * Added defines for TX Event FIFO Element * Adapted ndo_start_xmit function. For versions >= v3.1.x it uses the TX FIFO to optimize the data throughput. It stores the echo skb at the same index as in the M_CAN's TX FIFO. The frame's message marker is set to this index. This message marker is received in the TX Event FIFO after the message was successfully transmitted. It is used to echo the correct echo skb back to the network stack. * Added m_can_echo_tx_event function. It reads all received message markers in the TX Event FIFO and loops back the corresponding echo skbs. * ISR checks for new TX Event Entry interrupt for version >= 3.1.x. Signed-off-by: Mario Huettel <mario.huettel@gmx.net> Reviewed-by: Oliver Hartkopp <socketcan@hartkopp.net> Tested-by: Quentin Schulz <quentin.schulz@free-electrons.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:48 +02:00
Mario Huettel	428479e471	can: m_can: Configuration for TX and TX event FIFOs * TX/TX Event FIFO sizes are configured for version >= v3.1.x Signed-off-by: Mario Huettel <mario.huettel@gmx.net> Tested-by: Quentin Schulz <quentin.schulz@free-electrons.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:48 +02:00
Mario Huettel	b03cfc5bb0	can: m_can: Enable M_CAN version dependent initialization This patch adapts the initialization of the M_CAN. So it can be used with all versions >= 3.0.x. Changes: * Added version element to m_can_priv structure to hold M_CAN version. * Renamed bittiming structs for version 3.0.x * Added new bittiming structs for version >= 3.1.x * Function alloc_m_can_dev takes 2 new arguments. The TX FIFO size and the base address of the module. * Chip configuration for CAN_CTRLMODE_LOOPBACK is changed: Enabled CCCR_MON bit. In combination with TEST_LBCK it activates the internal loopback mode. Leaving CCCR_MON '0' results in external loopback mode. * Clocks are temporarily enabled by platform_propbe function in order to allow read access to the Core Release register and the Control Register. Registers are used to detect M_CAN version and optional Non-ISO Feature. Initialization of M_CAN for version >= 3.1.x: * TX FIFO of M_CAN is used to transmit frames. The driver does not need to stop the tx queue after each frame sent. * Initialization of TX Event FIFO is added. * NON-ISO is fixed for all M_CAN versions < 3.2.x. Version 3.2.x _can_ have the NISO (Non-ISO) bit which can switch the mode of the M_CAN to Non-ISO mode. This bit does not have to be writeable. Therefore it is checked. If it is writable Non-ISO support is added to the controllers supported CAN modes. New Functions: * Function to check the Core Release version. The read value determines the behaviour of the driver. * Function to check if the NISO bit for version >= 3.2.x is implemented. Signed-off-by: Mario Huettel <mario.huettel@gmx.net> Reviewed-by: Oliver Hartkopp <socketcan@hartkopp.net> Tested-by: Quentin Schulz <quentin.schulz@free-electrons.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:47 +02:00
Mario Huettel	5e1bd15a37	can: m_can: Updated register defines to newest version * Updated register defines to newest M_CAN version (v3.2.1). * Changed defines in the whole code. Signed-off-by: Mario Huettel <mario.huettel@gmx.net> Reviewed-by: Oliver Hartkopp <socketcan@hartkopp.net> Tested-by: Quentin Schulz <quentin.schulz@free-electrons.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:47 +02:00
Mario Huettel	ee8c3f6f75	can: m_can: Removed virtual address from print The virtual address of the device was printed. I removed it because it leaks internal information. Signed-off-by: Mario Huettel <mario.huettel@gmx.net> Tested-by: Quentin Schulz <quentin.schulz@free-electrons.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:46 +02:00
Mario Huettel	8f265895df	can: m_can: Removed initialization of FIFO water marks FIFO water marks disabled because the driver doesn't handle water mark events. Signed-off-by: Mario Huettel <mario.huettel@gmx.net> Reviewed-by: Oliver Hartkopp <socketcan@hartkopp.net> Tested-by: Quentin Schulz <quentin.schulz@free-electrons.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:46 +02:00
Mario Huettel	52973810b5	can: m_can: Disabled Interrupt Line 1 * Disabled interrupt line 1. The driver didn't use it. Signed-off-by: Mario Huettel <mario.huettel@gmx.net> Tested-by: Quentin Schulz <quentin.schulz@free-electrons.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:45 +02:00
Stephane Grosjean	8ac8321e4a	can: peak: add support for PEAK PCAN-PCIe FD CAN-FD boards This patch adds the support of the PCAN-PCI Express FD boards made by PEAK-System, for computers using the PCI Express slot. The PCAN-PCI Express FD has one or two CAN FD channels, depending on the model. A galvanic isolation of the CAN ports protects the electronics of the card and the respective computer against disturbances of up to 500 Volts. The PCAN-PCI Express FD can be operated with ambient temperatures in a range of -40 to +85 °C. Such boards run an extented version of the CAN-FD IP running into USB CAN-FD interfaces from PEAK-System, so this patch adds several new commands and their corresponding data types to the PEAK CAN-FD common definitions header file too. Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:45 +02:00
Stephane Grosjean	c3df7c5755	can: peak: move header file to new can common subdir The CAN-FD IP from PEAK-System runs into several kinds of PC CAN-FD interfaces. Up to now, only the USB CAN-FD adapters were supported by the Kernel. In order to prepare the adding of some new non-USB CAN-FD interfaces, this patch moves - and rename - the IP definitions file from its private (usb) sub-directory into a - newly created - CAN specific one. Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:44 +02:00
Stephane Grosjean	113ab88b2b	can: peak: fix usage of const qualifier in pointers args Fixes the usage of the const qualifier in the memory pointer arguments of the declared inline functions. By changing the line containing "const", this patch also changes the name of the arg into a more usual one. Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:44 +02:00
Stephane Grosjean	81c5e13d90	can: peak: fix usage of usb specific data type This patch fixes the wrong usage of a specific USB data type into a common header file. This common header file is intended to define the common data types and values that define access to the PEAK-System CAN-FD IP, whatever the PC interface is. Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:43 +02:00
David S. Miller	86a5df1495	Merge branch 'virtio-net-tx-napi' Willem de Bruijn says: ==================== virtio-net tx napi Add napi for virtio-net transmit completion processing. Changes: v2 -> v3: - convert __netif_tx_trylock to __netif_tx_lock on tx napi poll ensure that the handler always cleans, to avoid deadlock - unconditionally clean in start_xmit avoid adding an unnecessary "if (use_napi)" branch - remove virtqueue_disable_cb in patch 5/5 a noop in the common event_idx based loop - document affinity_hint_set constraint v1 -> v2: - disable by default - disable unless affinity_hint_set because cache misses add up to a third higher cycle cost, e.g., in TCP_RR tests. This is not limited to the patch that enables tx completion cleaning in rx napi. - use trylock to avoid contention between tx and rx napi - keep interrupts masked during xmit_more (new patch 5/5) this improves cycles especially for multi UDP_STREAM, which does not benefit from cleaning tx completions on rx napi. - move free_old_xmit_skbs (new patch 3/5) to avoid forward declaration not changed: - deduplicate virnet_poll_tx and virtnet_poll_txclean they look similar, but have differ too much to make it worthwhile. - delay netif_wake_subqueue for more than 2 + MAX_SKB_FRAGS evaluated, but made no difference - patch 1/5 RFC -> v1: - dropped vhost interrupt moderation patch: not needed and likely expensive at light load - remove tx napi weight - always clean all tx completions - use boolean to toggle tx-napi, instead - only clean tx in rx if tx-napi is enabled - then clean tx before rx - fix: add missing braces in virtnet_freeze_down - testing: add 4KB TCP_RR + UDP test results Based on previous patchsets by Jason Wang: [RFC V7 PATCH 0/7] enable tx interrupts for virtio-net http://lkml.iu.edu/hypermail/linux/kernel/1505.3/00245.html Before commit `b0c39dbdc2` ("virtio_net: don't free buffers in xmit ring") the virtio-net driver would free transmitted packets on transmission of new packets in ndo_start_xmit and, to catch the edge case when no new packet is sent, also in a timer at 10HZ. A timer can cause long stalls. VIRTIO_F_NOTIFY_ON_EMPTY avoids stalls due to low free descriptor count. It does not address a stalls due to low socket SO_SNDBUF. Increasing timer frequency decreases that stall time, but increases interrupt rate and, thus, cycle count. Currently, with no timer, packets are freed only at ndo_start_xmit. Latency of consume_skb is now unbounded. To avoid a deadlock if a sock reaches SO_SNDBUF, packets are orphaned on tx. This breaks TCP small queues. Reenable TCP small queues by removing the orphan. Instead of using a timer, convert the driver to regular tx napi. This does not have the unresolved stall issue and does not have any frequency to tune. By keeping interrupts enabled by default, napi increases tx interrupt rate. VIRTIO_F_EVENT_IDX avoids sending an interrupt if one is already unacknowledged, so makes this more feasible today. Combine that with an optimization that brings interrupt rate back in line with the existing version for most workloads: Tx completion cleaning on rx interrupts elides most explicit tx interrupts by relying on the fact that many rx interrupts fire. Tested by running {1, 10, 100} {TCP, UDP} STREAM, RR, 4K_RR benchmarks from a guest to a server on the host, on an x86_64 Haswell. The guest runs 4 vCPUs pinned to 4 cores. vhost and the test server are pinned to a core each. All results are the median of 5 runs, with variance well < 10%. Used neper (github.com/google/neper) as test process. Napi increases single stream throughput, but increases cycle cost. The optimizations bring this down. The previous patchset saw a regression with UDP_STREAM, which does not benefit from cleaning tx interrupts in rx napi. This regression is now gone for 10x, 100x. Remaining difference is higher 1x TCP_STREAM, lower 1x UDP_STREAM. The latest results are with process, rx napi and tx napi affine to the same core. All numbers are lower than the previous patchset. upstream napi TCP_STREAM: 1x: Mbps 27816 39805 Gcycles 274 285 10x: Mbps 42947 42531 Gcycles 300 296 100x: Mbps 31830 28042 Gcycles 279 269 TCP_RR Latency (us): 1x: p50 21 21 p99 27 27 Gcycles 180 167 10x: p50 40 39 p99 52 52 Gcycles 214 211 100x: p50 281 241 p99 411 337 Gcycles 218 226 TCP_RR 4K: 1x: p50 28 29 p99 34 36 Gcycles 177 167 10x: p50 70 71 p99 85 134 Gcycles 213 214 100x: p50 442 611 p99 802 785 Gcycles 237 216 UDP_STREAM: 1x: Mbps 29468 26800 Gcycles 284 293 10x: Mbps 29891 29978 Gcycles 285 312 100x: Mbps 30269 30304 Gcycles 318 316 UDP_RR: 1x: p50 19 19 p99 23 23 Gcycles 180 173 10x: p50 35 40 p99 54 64 Gcycles 245 237 100x: p50 234 286 p99 484 473 Gcycles 224 214 Note that GSO is enabled, so 4K RR still translates to one packet per request. Lower throughput at 100x vs 10x can be (at least in part) explained by looking at bytes per packet sent (nstat). It likely also explains the lower throughput of 1x for some variants. upstream: N=1 bytes/pkt=16581 N=10 bytes/pkt=61513 N=100 bytes/pkt=51558 at_rx: N=1 bytes/pkt=65204 N=10 bytes/pkt=65148 N=100 bytes/pkt=56840 ==================== Acked-by: Michael S. Tsirkin <mst@redhat.com>	2017-04-24 23:55:20 -04:00
Willem de Bruijn	bdb12e0d2f	virtio-net: keep tx interrupts disabled unless kick Tx napi mode increases the rate of transmit interrupts. Suppress some by masking interrupts while more packets are expected. The interrupts will be reenabled before the last packet is sent. This optimization reduces the througput drop with tx napi for unidirectional flows such as UDP_STREAM that do not benefit from cleaning tx completions in the the receive napi handler. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 23:55:19 -04:00
Willem de Bruijn	7b0411ef4a	virtio-net: clean tx descriptors from rx napi Amortize the cost of virtual interrupts by doing both rx and tx work on reception of a receive interrupt if tx napi is enabled. With VIRTIO_F_EVENT_IDX, this suppresses most explicit tx completion interrupts for bidirectional workloads. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 23:55:19 -04:00
Willem de Bruijn	ea7735d97b	virtio-net: move free_old_xmit_skbs An upcoming patch will call free_old_xmit_skbs indirectly from virtnet_poll. Move the function above this to avoid having to introduce a forward declaration. This is a pure move: no code changes. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 23:55:19 -04:00
Willem de Bruijn	b92f1e6751	virtio-net: transmit napi Convert virtio-net to a standard napi tx completion path. This enables better TCP pacing using TCP small queues and increases single stream throughput. The virtio-net driver currently cleans tx descriptors on transmission of new packets in ndo_start_xmit. Latency depends on new traffic, so is unbounded. To avoid deadlock when a socket reaches its snd limit, packets are orphaned on tranmission. This breaks socket backpressure, including TSQ. Napi increases the number of interrupts generated compared to the current model, which keeps interrupts disabled as long as the ring has enough free descriptors. Keep tx napi optional and disabled for now. Follow-on patches will reduce the interrupt cost. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 23:55:19 -04:00
Willem de Bruijn	e4e8452a4a	virtio-net: napi helper functions Prepare virtio-net for tx napi by converting existing napi code to use helper functions. This also deduplicates some logic. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 23:55:19 -04:00
David S. Miller	14933dc8d9	sparc64: Improve 64-bit constant loading in eBPF JIT. Doing a full 64-bit decomposition is really stupid especially for simple values like 0 and -1. But if we are going to optimize this, go all the way and try for all 2 and 3 instruction sequences not requiring a temporary register as well. First we do the easy cases where it's a zero or sign extended 32-bit number (sethi+or, sethi+xor, respectively). Then we try to find a range of set bits we can load simply then shift up into place, in various ways. Then we try negating the constant and see if we can do a simple sequence using that with a xor at the end. (f.e. the range of set bits can't be loaded simply, but for the negated value it can) The final optimized strategy involves 4 instructions sequences not needing a temporary register. Otherwise we sadly fully decompose using a temp.. Example, from ALU64_XOR_K: 0x0000ffffffff0000 ^ 0x0 = 0x0000ffffffff0000: 0000000000000000 <foo>: 0: 9d e3 bf 50 save %sp, -176, %sp 4: 01 00 00 00 nop 8: 90 10 00 18 mov %i0, %o0 c: 13 3f ff ff sethi %hi(0xfffffc00), %o1 10: 92 12 63 ff or %o1, 0x3ff, %o1 ! ffffffff <foo+0xffffffff> 14: 93 2a 70 10 sllx %o1, 0x10, %o1 18: 15 3f ff ff sethi %hi(0xfffffc00), %o2 1c: 94 12 a3 ff or %o2, 0x3ff, %o2 ! ffffffff <foo+0xffffffff> 20: 95 2a b0 10 sllx %o2, 0x10, %o2 24: 92 1a 60 00 xor %o1, 0, %o1 28: 12 e2 40 8a cxbe %o1, %o2, 38 <foo+0x38> 2c: 9a 10 20 02 mov 2, %o5 30: 10 60 00 03 b,pn %xcc, 3c <foo+0x3c> 34: 01 00 00 00 nop 38: 9a 10 20 01 mov 1, %o5 ! 1 <foo+0x1> 3c: 81 c7 e0 08 ret 40: 91 eb 40 00 restore %o5, %g0, %o0 Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 20:32:15 -07:00
David S. Miller	e3a724edee	sparc64: Support cbcond instructions in eBPF JIT. cbcond combines a compare with a branch into a single instruction. The limitations are: 1) Only newer chips support it 2) For immediate compares we are limited to 5-bit signed immediate values 3) The branch displacement is limited to 10-bit signed 4) We cannot use it for JSET Also, cbcond (unlike all other sparc control transfers) lacks a delay slot. Currently we don't have a useful instruction we can push into the delay slot of normal branches. So using cbcond pretty much always increases code density, and is therefore a win. Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 15:56:21 -07:00
David S. Miller	0e43d1009d	Merge branch 'bpf-misc-cleanups' Alexander Alemayhu says: ==================== Misc BPF cleanup while looking into making the Makefile in samples/bpf better handle O= I saw several warnings when running `make clean && make samples/bpf/`. This series reduces those warnings. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 16:20:20 -04:00
Alexander Alemayhu	dfc5be0dc0	samples/bpf: check before defining offsetof Fixes the following warning samples/bpf/test_lru_dist.c:28:0: warning: "offsetof" redefined #define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER) In file included from ./tools/lib/bpf/bpf.h:25:0, from samples/bpf/libbpf.h:5, from samples/bpf/test_lru_dist.c:24: /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include/stddef.h:417:0: note: this is the location of the previous definition #define offsetof(TYPE, MEMBER) __builtin_offsetof (TYPE, MEMBER) Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 16:20:19 -04:00
Alexander Alemayhu	4784726f69	samples/bpf: add static to function with no prototype Fixes the following warning samples/bpf/cookie_uid_helper_example.c: At top level: samples/bpf/cookie_uid_helper_example.c:276:6: warning: no previous prototype for ‘finish’ [-Wmissing-prototypes] void finish(int ret) ^~~~~~ HOSTLD samples/bpf/per_socket_stats_example Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 16:20:19 -04:00
Alexander Alemayhu	69b6a7f743	samples/bpf: add -Wno-unknown-warning-option to clang I was initially going to remove '-Wno-address-of-packed-member' because I thought it was not supposed to be there but Daniel suggested using '-Wno-unknown-warning-option'. This silences several warnings similiar to the one below warning: unknown warning option '-Wno-address-of-packed-member' [-Wunknown-warning-option] 1 warning generated. clang -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include -I./arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated -I./include -I./arch/x86/include/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h \ -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \ -Wno-compare-distinct-pointer-types \ -Wno-gnu-variable-sized-type-not-at-end \ -Wno-address-of-packed-member -Wno-tautological-compare \ -O2 -emit-llvm -c samples/bpf/xdp_tx_iptunnel_kern.c -o -\| llc -march=bpf -filetype=obj -o samples/bpf/xdp_tx_iptunnel_kern.o $ clang --version clang version 3.9.1 (tags/RELEASE_391/final) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /usr/bin Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 16:20:19 -04:00
Daniel Borkmann	e390b55d5a	bpf: make bpf_xdp_adjust_head support mandatory Now that also the last in-tree user of the xdp_adjust_head bit has been removed, we can remove the flag from struct bpf_prog altogether. This, at the same time, also makes sure that any future driver for XDP comes with bpf_xdp_adjust_head() support right away. A rejection based on this flag would also mean that tail calls couldn't be used with such driver as per `c2002f9837` ("bpf: fix checking xdp_adjust_head on tail calls") fix, thus lets not allow for it in the first place. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 16:18:10 -04:00
Pan Bian	91ec701a55	qlcnic: fix unchecked return value Function pci_find_ext_capability() may return 0, which is an invalid address. In function qlcnic_sriov_virtid_fn(), its return value is used without validation. This may result in invalid memory access bugs. This patch fixes the bug. Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 16:10:53 -04:00

1 2 3 4 5 ...

665482 Commits All Branches Search

665482 Commits

All Branches