Commit Graph

649468 Commits

Author SHA1 Message Date
Edward Cree 9b41080125 sfc: insert catch-all filters for encapsulated traffic
8000 series adapters support filtering VXLAN, NVGRE and GENEVE traffic
 based on inner fields, and when the NIC recognises such traffic, it
 does not match unencapsulated traffic filters any more.  So add catch-
 all filters for encapsulated traffic on supporting platforms.
Although recognition of VXLAN and GENEVE is based on UDP ports, and thus
 will not occur until the driver (on the primary PF) notifies the
 firmware of UDP ports to use, NVGRE will always be recognised, hence
 without this patch 8000 series adapters will drop all NVGRE traffic.

Partly based on patches by Jon Cooper <jcooper@solarflare.com>.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27 11:59:31 -05:00
Jon Cooper 34e7aefb2a sfc: refactor debug-or-warnings printks
Rationalise several debug-or-warnings printks using netif_cond_dbg
 to make output more consistent.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27 11:59:31 -05:00
Edward Cree f617f27653 net: implement netif_cond_dbg macro
For reporting things that may or may not be serious, depending on some
 condition, netif_cond_dbg will check the condition and print the report
 at either dbg (if the condition is true) or the specified level.

Suggested-by: Jon Cooper <jcooper@solarflare.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27 11:59:31 -05:00
Jon Cooper 2d3d4ec016 sfc: fixes to filter restore handling
If the NIC is switched from full-featured to low-latency, encapsulated
 filters are no longer available, and this causes errors. This patch
 removes those filters from the filter table on restore.
Also, if filters which are removed by the above, or which we fail to
 insert when restoring filters, were UC, MC or broadcast default
 filters, invalidate the corresponding vlan->default_filters entry.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27 11:59:30 -05:00
Tobias Klauser 2b368b234e net: wan: Remove unused stats member from struct frad_local
The stats member of struct frad_locl is used neither by the dlci nor the
sdla driver, so it might as well be removed.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27 11:32:26 -05:00
Rafał Miłecki 0fc9ae1076 net: phy: broadcom: add support for BCM54210E
It's Broadcom PHY simply described as single-port
RGMII 10/100/1000BASE-T PHY. It requires disabling delay skew and GTXCLK
bits.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27 11:29:18 -05:00
David S. Miller a00ebc464d linux-can-next-for-4.11-20170124
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEES2FAuYbJvAGobdVQPTuqJaypJWoFAliHTm8THG1rbEBwZW5n
 dXRyb25peC5kZQAKCRA9O6olrKklajYkB/wOJxby2DFi4ukCPyrTpXtkQWmXa6+M
 Wr/nY5PBg88JipQ+u3rl6bW9NMF3U3xtPO6I/ezfn1jfWOTg0jVIecEfqlW8JNAU
 275IOaiQ7jYgGEo+ALUAv2kSe7Fl8HQyaq+9ugMkWUTdv5Vjay1GZgCzyKZzjM8q
 8UOoB1YsXlB9f230HPMLeHcbGHqHCOpfNk9yWNcbd0Up6U7EweXtK4Pf/L8g+bL1
 0LR1QHRoUtmivuIi96KhB7O/+w8WuigWRy7qkfGpFescZQWOHzbwCVHlZ9ZYCOnJ
 eGEP8koksJuMg4OoEUCv0k2v3Txnfr8YNC5MvNdQ06YmoBWwQeamCozL
 =fBeU
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-next-for-4.11-20170124' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2017-01-24

this is a pull request of 4 patches for net-next/master.

The first patch by Oliver Hartkopp adds a netlink API to configure the
interface termination of a CAN card. The next two patches are by me and
add a netlink API to query and configure CAN interfaces that only
support fixed bitrates. The last patch by Colin Ian King simplifies the
return path in the softing_cs driver's softingcs_probe() function.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27 11:19:29 -05:00
Felix Jia 45ce0fd19d net/ipv6: support more tunnel interfaces for EUI64 link-local generation
Signed-off-by: Felix Jia <felix.jia@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27 10:25:34 -05:00
Felix Jia d35a00b8e3 net/ipv6: allow sysctl to change link-local address generation mode
The address generation mode for IPv6 link-local can only be configured
by netlink messages. This patch adds the ability to change the address
generation mode via sysctl.

v1 -> v2
Removed the rtnl lock and switch to use RCU lock to iterate through
the netdev list.

v2 -> v3
Removed the addrgenmode variable from the idev structure and use the
systcl storage for the flag.

Simplifed the logic for sysctl handling by removing the supported
for all operation.

Added support for more types of tunnel interfaces for link-local
address generation.

Based the patches from net-next.

v3 -> v4
Removed unnecessary whitespace changes.

Signed-off-by: Felix Jia <felix.jia@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27 10:25:34 -05:00
Florian Fainelli 6a4bc2b4c3 net: Fix ndo_setup_tc comment
Commit 16e5cc6471 ("net: rework setup_tc ndo op to consume
general tc operand") changed the ndo_setup_tc() signature, but did not
update the comments in netdevice.h, so do that now.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27 09:15:04 -05:00
David Ahern 1f17e2f2c8 net: ipv6: ignore null_entry on route dumps
lkp-robot reported a BUG:
[   10.151226] BUG: unable to handle kernel NULL pointer dereference at 00000198
[   10.152525] IP: rt6_fill_node+0x164/0x4b8
[   10.153307] *pdpt = 0000000012ee5001 *pde = 0000000000000000
[   10.153309]
[   10.154492] Oops: 0000 [#1]
[   10.154987] CPU: 0 PID: 909 Comm: netifd Not tainted 4.10.0-rc4-00722-g41e8c70ee162-dirty #10
[   10.156482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[   10.158254] task: d0deb000 task.stack: d0e0c000
[   10.159059] EIP: rt6_fill_node+0x164/0x4b8
[   10.159780] EFLAGS: 00010296 CPU: 0
[   10.160404] EAX: 00000000 EBX: d10c2358 ECX: c1f7c6cc EDX: c1f6ff44
[   10.161469] ESI: 00000000 EDI: c2059900 EBP: d0e0dc4c ESP: d0e0dbe4
[   10.162534]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[   10.163482] CR0: 80050033 CR2: 00000198 CR3: 10d94660 CR4: 000006b0
[   10.164535] Call Trace:
[   10.164993]  ? paravirt_sched_clock+0x9/0xd
[   10.165727]  ? sched_clock+0x9/0xc
[   10.166329]  ? sched_clock_cpu+0x19/0xe9
[   10.166991]  ? lock_release+0x13e/0x36c
[   10.167652]  rt6_dump_route+0x4c/0x56
[   10.168276]  fib6_dump_node+0x1d/0x3d
[   10.168913]  fib6_walk_continue+0xab/0x167
[   10.169611]  fib6_walk+0x2a/0x40
[   10.170182]  inet6_dump_fib+0xfb/0x1e0
[   10.170855]  netlink_dump+0xcd/0x21f

This happens when the loopback device is set down and a ipv6 fib route
dump is requested.

ip6_null_entry is the root of all ipv6 fib tables making it integrated
into the table and hence passed to the ipv6 route dump code. The
null_entry route uses the loopback device for dst.dev but may not have
rt6i_idev set because of the order in which initializations are done --
ip6_route_net_init is run before addrconf_init has initialized the
loopback device. Fixing the initialization order is a much bigger problem
with no obvious solution thus far.

The BUG is triggered when the loopback is set down and the netif_running
check added by a1a22c1206 fails. The fill_node descends to checking
rt->rt6i_idev for ignore_routes_with_linkdown and since rt6i_idev is
NULL it faults.

The null_entry route should not be processed in a dump request. Catch
and ignore. This check is done in rt6_dump_route as it is the highest
place in the callchain with knowledge of both the route and the network
namespace.

Fixes: a1a22c1206("net: ipv6: Keep nexthop of multipath route on admin down")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26 18:39:16 -05:00
David Ahern 3b7b2b0acd net: ipv6: remove skb_reserve in getroute
Remove skb_reserve and skb_reset_mac_header from inet6_rtm_getroute. The
allocated skb is not passed through the routing engine (like it is for
IPv4) and has not since the beginning of git time.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26 18:36:58 -05:00
David S. Miller b95db6687f Merge branch 'dsa2-pdata-prepatory-patches'
Florian Fainelli says:

====================
net: dsa: Preparatory patches

This patch series extracts the 4 patches of the larger: net: dsa: Support for
pdata in dsa2 while we wait for feedback from Greg KH on the device references.

Changes in v2:

- rebased properly after the multi-MDIO bus support added to mv88e6xxx
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26 15:43:54 -05:00
Florian Fainelli bc1727d242 net: dsa: Move ports assignment closer to error checking
Move the assignment of ports in _dsa_register_switch() closer to where
it is checked, no functional change. Re-order declarations to be
preserve the inverted christmas tree style.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26 15:43:53 -05:00
Florian Fainelli 3512a8e95e net: dsa: Suffix function manipulating device_node with _dn
Make it clear that these functions take a device_node structure pointer

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26 15:43:53 -05:00
Florian Fainelli 293784a8f8 net: dsa: Make most functions take a dsa_port argument
In preparation for allowing platform data, and therefore no valid
device_node pointer, make most DSA functions takes a pointer to a
dsa_port structure whenever possible. While at it, introduce a
dsa_port_is_valid() helper function which checks whether port->dn is
NULL or not at the moment.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26 15:43:53 -05:00
Florian Fainelli 55ed0ce089 net: dsa: Pass device pointer to dsa_register_switch
In preparation for allowing dsa_register_switch() to be supplied with
device/platform data, pass down a struct device pointer instead of a
struct device_node.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26 15:43:52 -05:00
Satanand Burla 80c8eae6ee liquidio: Avoid accessing skb after submitting to input queue
Accessing skb after submitting to input queue can cause
access to stale pointers if the skb ends up being transmitted
and freed by that time.

Signed-off-by: Satanand Burla <satananda.burla@cavium.com>
Signed-off-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26 15:42:18 -05:00
David S. Miller 49b3eb7725 This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
 
  - ignore self-generated loop detect MAC addresses in translation table,
    by Simon Wunderlich
 
  - install uapi batman_adv.h header, by Sven Eckelmann
 
  - bump copyright years, by Sven Eckelmann
 
  - Remove an unused variable in translation table code, by Sven Eckelmann
 
  - Handle NET_XMIT_CN like NET_XMIT_SUCCESS (revised according to Davids
    suggestion), and a follow up code clean up, by Gao Feng (2 patches)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAliKJcgWHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoYzFEACGNOZcW2bFFpuE5CZWZpTW1n24
 YAY/3C+THc89oUyn7ZbWpZ05HcG+JZXOWCWHaRTnd4UeA+sbki53Ioo52ZGgo79C
 7LsyXMMH3F2NrXVEfCK/MUGZcqBAxMd4dcDPsYz5q5osxydjG3hEJLikqOluop3P
 JKK9FTEeP2JeoWz9eEf7Io2te6EwcIjo3TRa9f53uyyhPnk5eS2NeSle5axZqS7c
 l+NSZVlrrG7oUB0UdzABt5NWOvjDc+Lqp1dkoJo17PHOgXialIfjZOKUlKtjVx6E
 06ow2HZaEYCtdlb0awjyPxtIpwiy0szTzJa/h4XtzeQHgbOKIqJWAEb80X7imHVE
 aljP7A1uuGm0bcVQ+pq21PX8yLu4RCDPDE5Khu9atkiQP5+sVEdGiJ8Soaw4PmoD
 yhDmXshrPGR8u5txN8gaHWG4MHt19645s8dHqHQ7tf5h+mf2QXQ2v/jJQsCV2UfY
 vLd/JOD4ZVYWDCspcDdEwGc8KB9r6P31wXuSVjYfkqTFXocdzDr87V7C69CS0U+b
 Lzel8oa/eVa2ppR+OhpELxhL2ahO7p1jZI2ix4NHftx5MAV4WJ3RfR2ev2Sf9Ukc
 aGjzjuml3JVo2i+11lqiAtOaBK9wDNv1CaC+D2NmC6wpWzWQryNryw3fm2SoH3Xm
 S+UoBDZpik+SIEBv3Q==
 =cd1Y
 -----END PGP SIGNATURE-----

Merge tag 'batadv-next-for-davem-20170126' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - ignore self-generated loop detect MAC addresses in translation table,
   by Simon Wunderlich

 - install uapi batman_adv.h header, by Sven Eckelmann

 - bump copyright years, by Sven Eckelmann

 - Remove an unused variable in translation table code, by Sven Eckelmann

 - Handle NET_XMIT_CN like NET_XMIT_SUCCESS (revised according to Davids
   suggestion), and a follow up code clean up, by Gao Feng (2 patches)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26 14:31:08 -05:00
Arnd Bergmann 27d807180a ISDN: eicon: reduce stack size of sig_ind function
I noticed that this function uses a lot of kernel stack when the
"latent entropy" plugin is enabled:

drivers/isdn/hardware/eicon/message.c: In function 'sig_ind':
drivers/isdn/hardware/eicon/message.c:6113:1: error: the frame size of 1168 bytes is larger than 1152 bytes [-Werror=frame-larger-than=]

We currently don't warn about this, as we raise the warning limit
to 2048 bytes in mainline, but I'd like to lower that limit again
in the future, and this function can easily be changed to be more
efficient and avoid that warning, by making some of its local
variables 'const'.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26 14:00:12 -05:00
Gao Feng c33705188c batman-adv: Treat NET_XMIT_CN as transmit successfully
The tc could return NET_XMIT_CN as one congestion notification, but
it does not mean the packet is lost. Other modules like ipvlan,
macvlan, and others treat NET_XMIT_CN as success too.

So batman-adv should handle NET_XMIT_CN also as NET_XMIT_SUCCESS.

Signed-off-by: Gao Feng <gfree.wind@gmail.com>
[sven@narfation.org: Moved NET_XMIT_CN handling to batadv_send_skb_packet]
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-26 08:41:18 +01:00
Gao Feng 0843f197c4 batman-adv: Remove one condition check in batadv_route_unicast_packet
It could decrease one condition check to collect some statements in the
first condition block.

Signed-off-by: Gao Feng <gfree.wind@gmail.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-26 08:37:01 +01:00
Sven Eckelmann 269cee6218 batman-adv: Remove unused variable in batadv_tt_local_set_flags
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-26 08:34:20 +01:00
Sven Eckelmann ac79cbb96b batman-adv: update copyright years for 2017
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-26 08:34:19 +01:00
Sven Eckelmann e60bf3ea67 uapi: install batman_adv.h header
09748a22f4 ("batman-adv: add generic netlink family for batman-adv")
introduced the new batman_adv.h which describes the netlink attributes and
commands of batman-adv. But the Kbuild entry to install the header was not
added.

All currently known tools ship their own copy of batman_adv.h but it should
be installed anyway to later be able to migrate to the system batman_adv.h.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-26 08:34:19 +01:00
Simon Wunderlich d3e9768ab9 batman-adv: don't add loop detect macs to TT
The bridge loop avoidance (BLA) feature of batman-adv sends packets to
probe for Mesh/LAN packet loops. Those packets are not sent by real
clients and should therefore not be added to the translation table (TT).

Signed-off-by: Simon Wunderlich <simon.wunderlich@open-mesh.com>
2017-01-26 08:34:18 +01:00
Arnd Bergmann 5b9d6b154a bridge: move maybe_deliver_addr() inside #ifdef
The only caller of this new function is inside of an #ifdef checking
for CONFIG_BRIDGE_IGMP_SNOOPING, so we should move the implementation
there too, in order to avoid this harmless warning:

net/bridge/br_forward.c:177:13: error: 'maybe_deliver_addr' defined but not used [-Werror=unused-function]

Fixes: 6db6f0eae6 ("bridge: multicast to unicast")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 23:16:06 -05:00
Daniel Borkmann 0901df3aea bpf: use prefix_len in test_tag when reading fdinfo
We currently used len instead of prefix_len for the strncmp() in
fdinfo on the prog_tag. It still worked as we matched on the correct
output line also with first 8 instead of 10 chars, but lets fix it
properly to use the intended length.

Fixes: 62b6466026 ("bpf: add prog tag test case to bpf selftests")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 23:15:28 -05:00
David S. Miller 1541f98cc2 Merge branch 'broadcom-phy-cleanup'
Rafał Miłecki says:

====================
net-next: Broadcom PHY driver cleanup

I will probably need to use broadcom.ko for PHY connected to interface
of bgmac supported device so I started looking at it willing to
understand it better.

I found AUXCTL part of the driver / lib a bit confusing and hard to read
so I'm trying to clean it up a bit. I hope this patchset makes following
AUXCTL operations much easier making it clear which defines are for
registers and which for values.

There is no functional change in this pachset.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 23:13:45 -05:00
Rafał Miłecki 5e7bfa6cb0 net: phy: bcm-phy-lib: clean up remaining AUXCTL register defines
1) Use 0x%02x format for register number. This follows some other
   defines and makes it easier to distinct register from values.
2) Put register define above values and sort the values. It makes
   reading header code easier.
3) Use 0x%04x format for all values. It's about consistency with other
   values (and most of the header) not a personal preference.
4) Separate define for reading shift value with an extre empty line.
   It's user for all AUXCTL registers in a bcm54xx_auxctl_read.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 23:13:44 -05:00
Rafał Miłecki 8293c7bcde net: phy: broadcom: drop duplicated define for RGMII SKEW delay
We had two defines for the same bit (both were used with the
MII_BCM54XX_AUXCTL_SHDWSEL_MISC register).

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 23:13:44 -05:00
Rafał Miłecki 85b4685da5 net: phy: broadcom: use auxctl reading helper in BCM54612E code
Starting with commit 5b4e290051 ("net: phy: broadcom: add
bcm54xx_auxctl_read") we have a reading helper so use it and avoid code
duplication.
It also means we don't need MII_BCM54XX_AUXCTL_SHDWSEL_MISC define as
it's the same as MII_BCM54XX_AUXCTL_SHDWSEL_MISC just for reading needs
(same value shifted by 12 bits).

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 23:13:44 -05:00
Andrew Lunn 434502930f net: dsa: Mop up remaining NET_DSA_HWMON references
Previous patches have moved the temperature sensor code into the
Marvell PHYs. A few now dead references to NET_DSA_HWMON were left
behind. Go reap them.

Reported-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 14:45:05 -05:00
Tomáš Pilař 6eacfb54ea sfc: reduce severity of PIO buffer alloc failures
PIO buffer allocation can fail for two valid reasons:
 - we've run out of them (results in -ENOSPC)
 - the NIC configuration doesn't support them (results in -EPERM)
Since both these failures are expected netif_err is excessive.

Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 14:44:00 -05:00
David S. Miller 761095c238 Merge branch 'thunderx-ethtool'
Sunil Goutham says:

====================
thunderx: More ethtool support and BGX configuration changes

These patches adds support to set queue sizes from ethtool and changes
the way serdes lane configuration is done by BGX driver on 81/83xx
platforms.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 14:42:37 -05:00
Sunil Goutham fff37fdad9 net: thunderx: Leave serdes lane config on 81/83xx to firmware
For DLMs and SLMs on 80/81/83xx, many lane configurations
across different boards are coming up. Also kernel doesn't have
any way to identify board type/info and since firmware does,
just get rid of figuring out lane to serdes config and take
whatever has been programmed by low level firmware.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 14:42:37 -05:00
Sunil Goutham fff4ffdde1 net: thunderx: Support to configure queue sizes from ethtool
Adds support to set Rx/Tx queue sizes from ethtool. Fixes
an issue with retrieving queue size. Also sets SQ's CQ_LIMIT
based on configured Tx queue size such that HW doesn't process
SQEs when there is no sufficient space in CQ.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 14:42:36 -05:00
Willy Tarreau 3979ad7e82 net/tcp-fastopen: make connect()'s return case more consistent with non-TFO
Without TFO, any subsequent connect() call after a successful one returns
-1 EISCONN. The last API update ensured that __inet_stream_connect() can
return -1 EINPROGRESS in response to sendmsg() when TFO is in use to
indicate that the connection is now in progress. Unfortunately since this
function is used both for connect() and sendmsg(), it has the undesired
side effect of making connect() now return -1 EINPROGRESS as well after
a successful call, while at the same time poll() returns POLLOUT. This
can confuse some applications which happen to call connect() and to
check for -1 EISCONN to ensure the connection is usable, and for which
EINPROGRESS indicates a need to poll, causing a loop.

This problem was encountered in haproxy where a call to connect() is
precisely used in certain cases to confirm a connection's readiness.
While arguably haproxy's behaviour should be improved here, it seems
important to aim at a more robust behaviour when the goal of the new
API is to make it easier to implement TFO in existing applications.

This patch simply ensures that we preserve the same semantics as in
the non-TFO case on the connect() syscall when using TFO, while still
returning -1 EINPROGRESS on sendmsg(). For this we simply tell
__inet_stream_connect() whether we're doing a regular connect() or in
fact connecting for a sendmsg() call.

Cc: Wei Wang <weiwan@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 14:12:21 -05:00
David S. Miller eb92f76ef5 Merge branch 'tcp-fastopen-new-API'
Wei Wang says:

====================
net/tcp-fastopen: Add new userspace API support

The patch series is to add support for new userspace API for TCP fastopen
sockets.
In the current code, user has to call sendto()/sendmsg() with special flag
MSG_FASTOPEN for TCP fastopen sockets. This API is quite different from the
normal TCP socket API and can be cumbersome for applications to make use
fastopen sockets.
So this new patch introduces a new way of using TCP fastopen sockets which
is similar to normal TCP sockets with a new sockopt TCP_FASTOPEN_CONNECT.
More details about it is described in the third patch.
(First 2 patches are preparations for the third patch.)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 14:04:39 -05:00
Wei Wang 19f6d3f3c8 net/tcp-fastopen: Add new API support
This patch adds a new socket option, TCP_FASTOPEN_CONNECT, as an
alternative way to perform Fast Open on the active side (client). Prior
to this patch, a client needs to replace the connect() call with
sendto(MSG_FASTOPEN). This can be cumbersome for applications who want
to use Fast Open: these socket operations are often done in lower layer
libraries used by many other applications. Changing these libraries
and/or the socket call sequences are not trivial. A more convenient
approach is to perform Fast Open by simply enabling a socket option when
the socket is created w/o changing other socket calls sequence:
  s = socket()
    create a new socket
  setsockopt(s, IPPROTO_TCP, TCP_FASTOPEN_CONNECT …);
    newly introduced sockopt
    If set, new functionality described below will be used.
    Return ENOTSUPP if TFO is not supported or not enabled in the
    kernel.

  connect()
    With cookie present, return 0 immediately.
    With no cookie, initiate 3WHS with TFO cookie-request option and
    return -1 with errno = EINPROGRESS.

  write()/sendmsg()
    With cookie present, send out SYN with data and return the number of
    bytes buffered.
    With no cookie, and 3WHS not yet completed, return -1 with errno =
    EINPROGRESS.
    No MSG_FASTOPEN flag is needed.

  read()
    Return -1 with errno = EWOULDBLOCK/EAGAIN if connect() is called but
    write() is not called yet.
    Return -1 with errno = EWOULDBLOCK/EAGAIN if connection is
    established but no msg is received yet.
    Return number of bytes read if socket is established and there is
    msg received.

The new API simplifies life for applications that always perform a write()
immediately after a successful connect(). Such applications can now take
advantage of Fast Open by merely making one new setsockopt() call at the time
of creating the socket. Nothing else about the application's socket call
sequence needs to change.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 14:04:38 -05:00
Wei Wang 25776aa943 net: Remove __sk_dst_reset() in tcp_v6_connect()
Remove __sk_dst_reset() in the failure handling because __sk_dst_reset()
will eventually get called when sk is released. No need to handle it in
the protocol specific connect call.
This is also to make the code path consistent with ipv4.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 14:04:38 -05:00
Wei Wang 065263f40f net/tcp-fastopen: refactor cookie check logic
Refactor the cookie check logic in tcp_send_syn_data() into a function.
This function will be called else where in later changes.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 14:04:38 -05:00
hayeswang a9c54ad2c7 r8152: fix the wrong spelling
Replace rumtime with runtime.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 13:24:34 -05:00
Andrew Lunn d234559952 Doc: DT: bindings: net: dsa: marvell.txt: Tabification
Replace spaces with tabs. Fix indentation to be multiples of tabs, not
a mixture or tabs and spaces.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 13:20:00 -05:00
David S. Miller cca316f356 Merge branch 'bpf-tracepoints'
Daniel Borkmann says:

====================
BPF tracepoints

This set adds tracepoints to BPF for better introspection and
debugging. The first two patches are prerequisite for the actual
third patch that adds the tracepoints. I think the first two are
small and straight forward enough that they could ideally go via
net-next, but I'm also open to other suggestions on how to route
them in case that's not applicable (it would reduce potential
merge conflicts on BPF side, though). For details, please see
individual patches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 13:17:48 -05:00
Daniel Borkmann a67edbf4fb bpf: add initial bpf tracepoints
This work adds a number of tracepoints to paths that are either
considered slow-path or exception-like states, where monitoring or
inspecting them would be desirable.

For bpf(2) syscall, tracepoints have been placed for main commands
when they succeed. In XDP case, tracepoint is for exceptions, that
is, f.e. on abnormal BPF program exit such as unknown or XDP_ABORTED
return code, or when error occurs during XDP_TX action and the packet
could not be forwarded.

Both have been split into separate event headers, and can be further
extended. Worst case, if they unexpectedly should get into our way in
future, they can also removed [1]. Of course, these tracepoints (like
any other) can be analyzed by eBPF itself, etc. Example output:

  # ./perf record -a -e bpf:* sleep 10
  # ./perf script
  sock_example  6197 [005]   283.980322:      bpf:bpf_map_create: map type=ARRAY ufd=4 key=4 val=8 max=256 flags=0
  sock_example  6197 [005]   283.980721:       bpf:bpf_prog_load: prog=a5ea8fa30ea6849c type=SOCKET_FILTER ufd=5
  sock_example  6197 [005]   283.988423:   bpf:bpf_prog_get_type: prog=a5ea8fa30ea6849c type=SOCKET_FILTER
  sock_example  6197 [005]   283.988443: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[06 00 00 00] val=[00 00 00 00 00 00 00 00]
  [...]
  sock_example  6197 [005]   288.990868: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[01 00 00 00] val=[14 00 00 00 00 00 00 00]
       swapper     0 [005]   289.338243:    bpf:bpf_prog_put_rcu: prog=a5ea8fa30ea6849c type=SOCKET_FILTER

  [1] https://lwn.net/Articles/705270/

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 13:17:47 -05:00
Daniel Borkmann 0fe0559179 lib, traceevent: add PRINT_HEX_STR variant
Add support for the __print_hex_str() macro that was added for
tracing, so that user space tools such as perf can understand
it as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 13:17:47 -05:00
Daniel Borkmann 2acae0d5b0 trace: add variant without spacing in trace_print_hex_seq
For upcoming tracepoint support for BPF, we want to dump the program's
tag. Format should be similar to __print_hex(), but without spacing.
Add a __print_hex_str() variant for exactly that purpose that reuses
trace_print_hex_seq().

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 13:17:47 -05:00
Eric Dumazet 60b1af3300 tcp: reduce skb overhead in selected places
tcp_add_backlog() can use skb_condense() helper to get better
gains and less SKB_TRUESIZE() magic. This only happens when socket
backlog has to be used.

Some attacks involve specially crafted out of order tiny TCP packets,
clogging the ofo queue of (many) sockets.
Then later, expensive collapse happens, trying to copy all these skbs
into single ones.
This unfortunately does not work if each skb has no neighbor in TCP
sequence order.

By using skb_condense() if the skb could not be coalesced to a prior
one, we defeat these kind of threats, potentially saving 4K per skb
(or more, since this is one page fragment).

A typical NAPI driver allocates gro packets with GRO_MAX_HEAD bytes
in skb->head, meaning the copy done by skb_condense() is limited to
about 200 bytes.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 13:13:31 -05:00
David S. Miller 716dcaebed mlx5-updates-2017-24-01
The first seven patches from Or Gerlitz in this series further enhances
 the mlx5 SRIOV switchdev mode to support offloading IPv6 tunnels using the
 TC tunnel key set (encap) and unset (decap) actions.
 
 Or Gerlitz says:
 ========================
 As part of doing this change, few cleanups are done in the IPv4 code,
 later we move to use the full tunnel key info provided to the driver as
 the key for our internal hashing which is used to identify cases where
 the same tunnel is used for encapsulating multiple flows. As done in the
 IPv4 case, the control path for offloading IPv6 tunnels uses route/neigh
 lookups and construction of the IPv6 tunnel headers on the encap path and
 matching on the outer hears in the decap path.
 
 The last patch of the series enlarges the HW FDB size for the switchdev mode,
 so it has now room to contain offloaded flows as many as min(max number
 of HW flow counters supported, max HW table size supported).
 ========================
 
 Next to Or's series you can find several patches handling several topics.
 
 From Mohamad, add support for SRIOV VF min rate guarantee by using the
 TSAR BW share weights mechanism.
 
 From Or, Two patches to enable Eth VFs to query their min-inline value for
 user-space.
 for that we move a mlx5 low level min inline helper function from mlx5
 ethernet driver into the core driver and then use it in mlx5_ib to expose
 the inline mode to rdma applications through libmlx5.
 
 From Kamal Heib, Reduce memory consumption on kdump kernel.
 
 From Shaker Daibes, code reuse in CQE compression control logic
 
 Thanks,
 Saeed.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJYh7FNAAoJEEg/ir3gV/o+TjsIAL1e92+5eutBS9ZvhMARi+Tc
 c2V9V8bG8W1RWWTvx1G0aU4nNjWsr5L8Q8gzqpwhrQITBfgpWd+hlnxQCucyhxC3
 AC1qQ+AKREe/C+25D+WJRq34/61ZHEH2rbKZvpZ1O8SuicVPbcvJ9eM+wOEDxwwX
 u5C5kWQ0HRtCcnFiiOYkB+0CQPH7m3+ZzZek+jDowrexHMSE+yl8ZNtaSTX9c9QN
 bE2cPiCVZd7ufKPIwY8LWHBryyl7sh5P+NqzD633OeiqP/pkZsW9A+czyt+d330f
 6XTKOS1PCD+TfHE0sZJT4VMCjICMHrOFbNRZuwcxJQ6NfmwIJZfskX4NLbyGQTI=
 =vF7U
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2017-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2017-24-01

The first seven patches from Or Gerlitz in this series further enhances
the mlx5 SRIOV switchdev mode to support offloading IPv6 tunnels using the
TC tunnel key set (encap) and unset (decap) actions.

Or Gerlitz says:
========================
As part of doing this change, few cleanups are done in the IPv4 code,
later we move to use the full tunnel key info provided to the driver as
the key for our internal hashing which is used to identify cases where
the same tunnel is used for encapsulating multiple flows. As done in the
IPv4 case, the control path for offloading IPv6 tunnels uses route/neigh
lookups and construction of the IPv6 tunnel headers on the encap path and
matching on the outer hears in the decap path.

The last patch of the series enlarges the HW FDB size for the switchdev mode,
so it has now room to contain offloaded flows as many as min(max number
of HW flow counters supported, max HW table size supported).
========================

Next to Or's series you can find several patches handling several topics.

From Mohamad, add support for SRIOV VF min rate guarantee by using the
TSAR BW share weights mechanism.

From Or, Two patches to enable Eth VFs to query their min-inline value for
user-space.
for that we move a mlx5 low level min inline helper function from mlx5
ethernet driver into the core driver and then use it in mlx5_ib to expose
the inline mode to rdma applications through libmlx5.

From Kamal Heib, Reduce memory consumption on kdump kernel.

From Shaker Daibes, code reuse in CQE compression control logic
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 12:49:58 -05:00