Commit Graph

754311 Commits

Author SHA1 Message Date
David S. Miller 4cbd7a7d3c Merge branch 'dsa-Plug-in-PHYLINK-support'
Florian Fainelli says:

====================
net: dsa: Plug in PHYLINK support

This patch series adds PHYLINK support to DSA which is necessary to support more
complex PHY and pluggable modules setups.

Patch series can be found here:

https://github.com/ffainelli/linux/commits/dsa-phylink-v2

This was tested on:

- dsa-loop
- bcm_sf2
- mv88e6xxx
- b53

With a variety of test cases:
- internal & external MDIO PHYs
- MoCA with link notification through interrupt/MMIO register
- built-in PHYs
- ifconfig up/down for several cycles works
- bind/unbind of the drivers

Changes in v2:

- fixed link configuration for mv88e6xxx (Andrew) after introducing polling

This is technically v2 of what was posted back in March 2018, changes from last
time:

- fixed probe/remove of drivers
- fixed missing gpiod_put() for link GPIOs
- fixed polling of link GPIOs (Russell I would need your SoB on the patch you
  provided offline initially, added some modifications to it)
- tested across a wider set of platforms

And everything should still work as expected. Please be aware of the following:

- switch drivers (like bcm_sf2) which may have user-facing network ports using
  fixed links would need to implement phylink_mac_ops to remain functional.
  PHYLINK does not create a phy_device for fixed links, therefore our
  call to adjust_link() from phylink_mac_link_{up,down} would not be calling
  into the driver. This *should not* affect CPU/DSA ports which are configured
  through adjust_link() but have no network devices

- support for SFP/SFF is now possible, but switch drivers will still need some
  modifications to properly support those, including, but not limited to using
  the correct binding information. This will be submitted on top of this series

Please do test on your respective platforms/switches and let me know if you
find any issues, hopefully everything still works like before.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11 12:03:06 -04:00
Florian Fainelli 58d56fcc39 net: dsa: bcm_sf2: Get rid of PHYLIB functions
Now that we have converted the bcm_sf2 driver to implement PHYLINK MAC
operations, we can remove the PHYLIB callbacks: adjust_link() and
fixed_link_update() which are no longer called by DSA.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11 12:03:06 -04:00
Florian Fainelli aab9c4067d net: dsa: Plug in PHYLINK support
Add support for PHYLINK within the DSA subsystem in order to support more
complex devices such as pluggable (SFP) and non-pluggable (SFF) modules, 10G
PHYs, and traditional PHYs. Using PHYLINK allows us to drop some amount of
complexity we had while probing fixed and non-fixed PHYs using Device Tree.

Because PHYLINK separates the Ethernet MAC/port configuration into different
stages, we let switch drivers implement those, and for now, we maintain
functionality by calling dsa_slave_adjust_link() during
phylink_mac_link_{up,down} which provides semantically equivalent steps.

Drivers willing to take advantage of PHYLINK should implement the phylink_mac_*
operations that DSA wraps.

We cannot quite remove the adjust_link() callback just yet, because a number of
drivers rely on that for configuring their "CPU" and "DSA" ports, this is done
dsa_port_setup_phy_of() and dsa_port_fixed_link_register_of() still.

Drivers that utilize fixed links for user-facing ports (e.g: bcm_sf2) will need
to implement phylink_mac_ops from now on to preserve functionality, since PHYLINK
*does not* create a phy_device instance for fixed links.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11 12:03:06 -04:00
Russell King c9a2356f35 net: dsa: mv88e6xxx: add PHYLINK support
Add rudimentary phylink support to mv88e6xxx. This allows the driver
using user ports with fixed links to keep operating normally. User ports
with normal PHYs are not affected since the switch automatically manages
their link parameters. User facing ports which use a SFP/SFF with a
non-fixed link mode might require a call to phylink_mac_change() to
operate properly.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
[Andrew: fixed link setting after adding link polling]
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
[florian: expand commit message]
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11 12:03:06 -04:00
Florian Fainelli c4aef9fc0d net: dsa: Eliminate dsa_slave_get_link()
Since we use PHYLIB to manage the per-port link indication, this will
also be reflected correctly in the network device's carrier state, so we
can use ethtool_op_get_link() instead.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11 12:03:06 -04:00
Florian Fainelli bc0cb653a3 net: dsa: bcm_sf2: Implement phylink_mac_ops
Make the bcm_sf2 driver implement phylink_mac_ops since it needs to
support a wide variety of network interfaces: internal & external MDIO
PHYs, fixed PHYs, MoCA with MMIO link status.

A large amount of what needs to be done already exists under
bcm_sf2_sw_adjust_link() so we are essentially breaking this down into
the necessary operation for PHYLINK to work: mac_config, mac_link_up,
mac_link_down and validate. We can now entirely get rid of most of what
fixed_link_update() provided because only the link information is actually
necessary. We still have to force DUPLEX_FULL for legacy Device Tree bindings
that did not specify that before.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11 12:03:05 -04:00
Florian Fainelli 11d8f3ddab net: dsa: Add PHYLINK switch operations
In preparation for adding support for PHYLINK within DSA, define a number of
operations that we will need and that switch drivers can start implementing.
Proper integration with PHYLINK will follow in subsequent patches.

We start selecting PHYLINK (which implies PHYLIB) in net/dsa/Kconfig
such that drivers can be guaranteed that this dependency is properly
taken care of and can start referencing PHYLINK helper functions without
requiring stubs or anything.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11 12:03:05 -04:00
Russell King 9cd00a8aa4 net: phy: phylink: Poll link GPIOs
When using a fixed link with a link GPIO, we need to poll that GPIO to
determine link state changes. This is consistent with what fixed_phy.c does.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11 12:03:05 -04:00
Florian Fainelli daab3349ad net: phy: phylink: Release link GPIO
We are not releasing the link GPIO descriptor with gpiod_put() which results in
subsequent probing to get -EBUSY when calling fwnode_get_named_gpiod(). Fix this
by doing the release in phylink_destroy().

Fixes: 9525ae8395 ("phylink: add phylink infrastructure")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11 12:03:05 -04:00
Florian Fainelli bb322a9038 net: phy: phylink: Use gpiod_get_value_cansleep()
The GPIO provider for the link GPIO line might require the use of the
_cansleep() API, utilize that. This is safe to do since we run in workqueue
context.

Fixes: 9525ae8395 ("phylink: add phylink infrastructure")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11 12:03:05 -04:00
Andrey Ignatov 1b97013bfb ipv4: fix memory leaks in udp_sendmsg, ping_v4_sendmsg
Fix more memory leaks in ip_cmsg_send() callers. Part of them were fixed
earlier in 919483096b.

* udp_sendmsg one was there since the beginning when linux sources were
  first added to git;
* ping_v4_sendmsg one was copy/pasted in c319b4d76b.

Whenever return happens in udp_sendmsg() or ping_v4_sendmsg() IP options
have to be freed if they were allocated previously.

Add label so that future callers (if any) can use it instead of kfree()
before return that is easy to forget.

Fixes: c319b4d76b (net: ipv4: add IPPROTO_ICMP socket kind)
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11 12:00:58 -04:00
Christophe JAILLET 8ccc113172 mlxsw: core: Fix an error handling path in 'mlxsw_core_bus_device_register()'
Resources are not freed in the reverse order of the allocation.
Labels are also mixed-up.

Fix it and reorder code and labels in the error handling path of
'mlxsw_core_bus_device_register()'

Fixes: ef3116e540 ("mlxsw: spectrum: Register KVD resources with devlink")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11 11:56:05 -04:00
David S. Miller 89dd2e752c Merge branch 'bonding-bug-fixes-and-regressions'
Debabrata Banerjee says:

====================
bonding: bug fixes and regressions

Fixes to bonding driver for balance-alb mode, suitable for stable.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11 11:50:41 -04:00
Debabrata Banerjee 21706ee8a4 bonding: send learning packets for vlans on slave
There was a regression at some point from the intended functionality of
commit f60c3704e8 ("bonding: Fix alb mode to only use first level
vlans.")

Given the return value vlan_get_encap_level() we need to store the nest
level of the bond device, and then compare the vlan's encap level to
this. Without this, this check always fails and learning packets are
never sent.

In addition, this same commit caused a regression in the behavior of
balance_alb, which requires learning packets be sent for all interfaces
using the slave's mac in order to load balance properly. For vlan's
that have not set a user mac, we can send after checking one bit.
Otherwise we need send the set mac, albeit defeating rx load balancing
for that vlan.

Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11 11:50:41 -04:00
Debabrata Banerjee 4fa8667ca3 bonding: do not allow rlb updates to invalid mac
Make sure multicast, broadcast, and zero mac's cannot be the output of rlb
updates, which should all be directed arps. Receive load balancing will be
collapsed if any of these happen, as the switch will broadcast.

Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11 11:50:41 -04:00
Steven Rostedt (VMware) dc432c3d7f tracing: Fix regex_match_front() to not over compare the test string
The regex match function regex_match_front() in the tracing filter logic,
was fixed to test just the pattern length from testing the entire test
string. That is, it went from strncmp(str, r->pattern, len) to
strcmp(str, r->pattern, r->len).

The issue is that str is not guaranteed to be nul terminated, and if r->len
is greater than the length of str, it can access more memory than is
allocated.

The solution is to add a simple test if (len < r->len) return 0.

Cc: stable@vger.kernel.org
Fixes: 285caad415 ("tracing/filters: Fix MATCH_FRONT_ONLY filter matching")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-05-11 10:56:42 -04:00
Rafael J. Wysocki ef050374e1 Merge branches 'pm-pci' and 'pm-docs'
* pm-pci:
  PCI / PM: Check device_may_wakeup() in pci_enable_wake()
  PCI / PM: Always check PME wakeup capability for runtime wakeup support

* pm-docs:
  PM: docs: intel_pstate: fix Active Mode w/o HWP paragraph
  PM: docs: sleep-states: Fix a typo ("includig")
2018-05-11 15:17:18 +02:00
Jann Horn 0a0b987344 compat: fix 4-byte infoleak via uninitialized struct field
Commit 3a4d44b616 ("ntp: Move adjtimex related compat syscalls to
native counterparts") removed the memset() in compat_get_timex().  Since
then, the compat adjtimex syscall can invoke do_adjtimex() with an
uninitialized ->tai.

If do_adjtimex() doesn't write to ->tai (e.g.  because the arguments are
invalid), compat_put_timex() then copies the uninitialized ->tai field
to userspace.

Fix it by adding the memset() back.

Fixes: 3a4d44b616 ("ntp: Move adjtimex related compat syscalls to native counterparts")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-10 17:51:58 -07:00
Dave Airlie 72777fe797 Merge branch 'drm-fixes-4.17' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Single amdgpu regression fix

* 'drm-fixes-4.17' of git://people.freedesktop.org/~agd5f/linux:
  drm/amd/pp: Fix performance drop on Fiji
2018-05-11 10:37:17 +10:00
Daniel Borkmann a84880ef43 Merge branch 'bpf-perf-rb-libbpf'
Jakub Kicinski says:

====================
This series started out as a follow up to the bpftool perf event dumping
patches.

As suggested by Daniel patch 1 makes use of PERF_SAMPLE_TIME to simplify
code and improve accuracy of timestamps.

Remaining patches are trying to move perf event loop into libbpf as
suggested by Alexei.  One user for this new function is bpftool which
links with libbpf nicely, the other, unfortunately, is in samples/bpf.
Remaining patches make samples/bpf link against full libbpf.a (not just
a handful of objects).  Once we have full power of libbpf at our disposal
we can convert some of XDP samples to use libbpf loader instead of
bpf_load.c.  My understanding is that this is the desired direction,
at least for networking code.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 01:48:32 +02:00
Jakub Kicinski be5bca44aa samples: bpf: convert some XDP samples from bpf_load to libbpf
Now that we can use full powers of libbpf in BPF samples, we
should perhaps make the simplest XDP programs not depend on
bpf_load helpers.  This way newcomers will be exposed to the
recommended library from the start.

Use of bpf_prog_load_xattr() will also make it trivial to later
on request offload of the programs by simply adding ifindex to
the xattr.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 01:44:17 +02:00
Jakub Kicinski 17387dd5ac tools: bpf: don't complain about no kernel version for networking code
BPF programs only have to specify the target kernel version for
tracing related hooks, in networking world that requirement does
not really apply.  Loosen the checks in libbpf to reflect that.

bpf_object__open() users will continue to see the error for backward
compatibility (and because prog_type is not available there).

Error code for NULL file name is changed from ENOENT to EINVAL,
as it seems more appropriate, hopefully, that's an OK change.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 01:40:52 +02:00
Jakub Kicinski 2eb57bb8f6 tools: bpf: improve comments in libbpf.h
Fix spelling mistakes, improve and clarify the language of comments
in libbpf.h.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 01:40:52 +02:00
Jakub Kicinski d0cabbb021 tools: bpf: move the event reading loop to libbpf
There are two copies of event reading loop - in bpftool and
trace_helpers "library".  Consolidate them and move the code
to libbpf.  Return codes from trace_helpers are kept, but
renamed to include LIBBPF prefix.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 01:40:52 +02:00
Jakub Kicinski 5f9380572b samples: bpf: compile and link against full libbpf
samples/bpf currently cherry-picks object files from tools/lib/bpf
to link against.  Just compile the full library and link statically
against it.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 01:40:52 +02:00
Jakub Kicinski 74662ea5d4 samples: bpf: rename struct bpf_map_def to avoid conflict with libbpf
Both tools/lib/bpf/libbpf.h and samples/bpf/bpf_load.h define their
own version of struct bpf_map_def.  The version in bpf_load.h has
more fields.  libbpf does not support inner maps and its definition
of struct bpf_map_def lacks the related fields.  Rename the definition
in bpf_load.h (samples/bpf) to avoid conflicts.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 01:40:51 +02:00
Jakub Kicinski e3687510fc tools: bpftool: use PERF_SAMPLE_TIME instead of reading the clock
Ask the kernel to include sample time in each even instead of
reading the clock.  This is also more accurate because our
clock reading was done when user space would dump the buffer,
not when sample was produced.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 01:40:51 +02:00
Prashant Bhole cb9c28ef57 bpf: sync tools bpf.h uapi header
Sync the header from include/uapi/linux/bpf.h which was updated to add
fib lookup helper function. This fixes selftests/bpf build failure.

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 01:35:25 +02:00
Joe Stringer 91bc07c9e8 selftests/bpf: Fix bash reference in Makefile
'|& ...' is a bash 4.0+ construct which is not guaranteed to be available
when using '$(shell ...)' in a Makefile. Fall back to the more portable
'2>&1 | ...'.

Fixes the following warning during compilation:

	/bin/sh: 1: Syntax error: "&" unexpected

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 01:32:07 +02:00
Roi Dayan f85900c3e1 net/mlx5e: Err if asked to offload TC match on frag being first
The HW doesn't support matching on frag first/later, return error if we are
asked to offload that.

Fixes: 3f7d0eb42d ("net/mlx5e: Offload TC matching on packets being IP fragments")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-10 16:10:13 -07:00
Adi Nissim 88d725bbb4 net/mlx5: E-Switch, Include VF RDMA stats in vport statistics
The host side reporting of VF vport statistics didn't include the VF
RDMA traffic.

Fixes: 3b751a2a41 ("net/mlx5: E-Switch, Introduce get vf statistics")
Signed-off-by: Adi Nissim <adin@mellanox.com>
Reported-by: Ariel Almog <ariela@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-10 16:10:13 -07:00
Daniel Jurgens 1ef903bf79 net/mlx5: Free IRQs in shutdown path
Some platforms require IRQs to be free'd in the shutdown path. Otherwise
they will fail to be reallocated after a kexec.

Fixes: 8812c24d28 ("net/mlx5: Add fast unload support in shutdown flow")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-10 16:10:03 -07:00
David Howells 6b47fe1d1c rxrpc: Trace UDP transmission failure
Add a tracepoint to log transmission failure from the UDP transport socket
being used by AF_RXRPC.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-10 23:26:01 +01:00
David Howells 494337c918 rxrpc: Add a tracepoint to log ICMP/ICMP6 and error messages
Add a tracepoint to log received ICMP/ICMP6 events and other error
messages.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-10 23:26:01 +01:00
David Howells 93864fc3ff rxrpc: Fix the min security level for kernel calls
Fix the kernel call initiation to set the minimum security level for kernel
initiated calls (such as from kAFS) from the sockopt value.

Fixes: 19ffa01c9c ("rxrpc: Use structs to hold connection params and protocol info")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-10 23:26:01 +01:00
David Howells f2aeed3a59 rxrpc: Fix error reception on AF_INET6 sockets
AF_RXRPC tries to turn on IP_RECVERR and IP_MTU_DISCOVER on the UDP socket
it just opened for communications with the outside world, regardless of the
type of socket.  Unfortunately, this doesn't work with an AF_INET6 socket.

Fix this by turning on IPV6_RECVERR and IPV6_MTU_DISCOVER instead if the
socket is of the AF_INET6 family.

Without this, kAFS server and address rotation doesn't work correctly
because the algorithm doesn't detect received network errors.

Fixes: 75b54cb57c ("rxrpc: Add IPv6 support")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-10 23:26:00 +01:00
David Howells c54e43d752 rxrpc: Fix missing start of call timeout
The expect_rx_by call timeout is supposed to be set when a call is started
to indicate that we need to receive a packet by that point.  This is
currently put back every time we receive a packet, but it isn't started
when we first send a packet.  Without this, the call may wait forever if
the server doesn't deign to reply.

Fix this by setting the timeout upon a successful UDP sendmsg call for the
first DATA packet.  The timeout is initiated only for initial transmission
and not for subsequent retries as we don't want the retry mechanism to
extend the timeout indefinitely.

Fixes: a158bdd324 ("rxrpc: Fix call timeouts")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-10 23:26:00 +01:00
Daniel Borkmann ff1f56d987 Merge branch 'bpf-fib-lookup-helper'
David Ahern says:

====================
Provide a helper for doing a FIB and neighbor lookup in the kernel
tables from an XDP program. The helper provides a fastpath for forwarding
packets. If the packet is a local delivery or for any reason is not a
simple lookup and forward, the packet is expected to continue up the stack
for full processing.

The response from a FIB and neighbor lookup is either the egress index
with the bpf_fib_lookup struct filled in with dmac and gateway or
0 meaning the packet should continue up the stack. In time we can
revisit this to return the FIB lookup result errno if it is one of the
special RTN_'s such as RTN_BLACKHOLE (-EINVAL) so that the XDP
programs can do an early drop if desired.

Patches 1-6 do some more refactoring to IPv6 with the end goal of
extracting a FIB lookup function that aligns with fib_lookup for IPv4,
basically returning a fib6_info without creating a dst based entry.

Patch 7 adds lookup functions to the ipv6 stub. These are needed since
bpf is built into the kernel and ipv6 may not be built or loaded.

Patch 8 adds the bpf helper and 9 adds a sample program.

v3
- remove ETH_ALEN and in6_addr from uapi header

v2
- removed pkt_access from bpf_func_proto as noticed by Daniel
- added check in that IPv6 forwarding is enabled
- added DaveM's ack on patches 1-7 and 9 based on v1 response and
  fact that no changes were made to them in v2

v1
- updated commit messages and cover letter
- added comment to sample program noting lack of verification on
  egress device supporting XDP

RFC v2
- fixed use of foward helper from cls_act as noted by Daniel
- in patch 1 rename fib6_lookup_1 as well for consistency
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:10:59 +02:00
David Ahern fe616055f7 samples/bpf: Add example of ipv4 and ipv6 forwarding in XDP
Simple example of fast-path forwarding. It has a serious flaw
in not verifying the egress device index supports XDP forwarding.
If the egress device does not packets are dropped.

Take this only as a simple example of fast-path forwarding.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:10:57 +02:00
David Ahern 87f5fc7e48 bpf: Provide helper to do forwarding lookups in kernel FIB table
Provide a helper for doing a FIB and neighbor lookup in the kernel
tables from an XDP program. The helper provides a fastpath for forwarding
packets. If the packet is a local delivery or for any reason is not a
simple lookup and forward, the packet continues up the stack.

If it is to be forwarded, the forwarding can be done directly if the
neighbor is already known. If the neighbor does not exist, the first
few packets go up the stack for neighbor resolution. Once resolved, the
xdp program provides the fast path.

On successful lookup the nexthop dmac, current device smac and egress
device index are returned.

The API supports IPv4, IPv6 and MPLS protocols, but only IPv4 and IPv6
are implemented in this patch. The API includes layer 4 parameters if
the XDP program chooses to do deep packet inspection to allow compare
against ACLs implemented as FIB rules.

Header rewrite is left to the XDP program.

The lookup takes 2 flags:
- BPF_FIB_LOOKUP_DIRECT to do a lookup that bypasses FIB rules and goes
  straight to the table associated with the device (expert setting for
  those looking to maximize throughput)

- BPF_FIB_LOOKUP_OUTPUT to do a lookup from the egress perspective.
  Default is an ingress lookup.

Initial performance numbers collected by Jesper, forwarded packets/sec:

       Full stack    XDP FIB lookup    XDP Direct lookup
IPv4   1,947,969       7,074,156          7,415,333
IPv6   1,728,000       6,165,504          7,262,720

These number are single CPU core forwarding on a Broadwell
E5-1650 v4 @ 3.60GHz.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:10:57 +02:00
David Ahern 65a2022e89 net/ipv6: Add fib lookup stubs for use in bpf helper
Add stubs to retrieve a handle to an IPv6 FIB table, fib6_get_table,
a stub to do a lookup in a specific table, fib6_table_lookup, and
a stub for a full route lookup.

The stubs are needed for core bpf code to handle the case when the
IPv6 module is not builtin.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:10:57 +02:00
David Ahern d4bea421f7 net/ipv6: Update fib6 tracepoint to take fib6_info
Similar to IPv4, IPv6 should use the FIB lookup result in the
tracepoint.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:10:57 +02:00
David Ahern 138118ec96 net/ipv6: Add fib6_lookup
Add IPv6 equivalent to fib_lookup. Does a fib lookup, including rules,
but returns a FIB entry, fib6_info, rather than a dst based rt6_info.
fib6_lookup is any where from 140% (MULTIPLE_TABLES config disabled)
to 60% faster than any of the dst based lookup methods (without custom
rules) and 25% faster with custom rules (e.g., l3mdev rule).

Since the lookup function has a completely different signature,
fib6_rule_action is split into 2 paths: the existing one is
renamed __fib6_rule_action and a new one for the fib6_info path
is added. fib6_rule_action decides which to call based on the
lookup_ptr. If it is fib6_table_lookup then the new path is taken.

Caller must hold rcu lock as no reference is taken on the returned
fib entry.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:10:56 +02:00
David Ahern cc065a9eb9 net/ipv6: Refactor fib6_rule_action
Move source address lookup from fib6_rule_action to a helper. It will be
used in a later patch by a second variant for fib6_rule_action.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:10:56 +02:00
David Ahern 1d053da910 net/ipv6: Extract table lookup from ip6_pol_route
ip6_pol_route is used for ingress and egress FIB lookups. Refactor it
moving the table lookup into a separate fib6_table_lookup that can be
invoked separately and export the new function.

ip6_pol_route now calls fib6_table_lookup and uses the result to generate
a dst based rt6_info.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:10:56 +02:00
David Ahern 3b290a31bb net/ipv6: Rename rt6_multipath_select
Rename rt6_multipath_select to fib6_multipath_select and export it.
A later patch wants access to it similar to IPv4's fib_select_path.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:10:56 +02:00
David Ahern 6454743bc1 net/ipv6: Rename fib6_lookup to fib6_node_lookup
Rename fib6_lookup to fib6_node_lookup to better reflect what it
returns. The fib6_lookup name will be used in a later patch for
an IPv6 equivalent to IPv4's fib_lookup.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:10:56 +02:00
Wang YanQing 68625b7631 bpf, doc: clarification for the meaning of 'id'
For me, as a reader whose mother language isn't English, the
old words bring a little difficulty to catch the meaning, this
patch rewords the subsection in a more clarificatory way.

This patch also add blank lines as separator at two places
to improve readability.

Signed-off-by: Wang YanQing <udknight@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:07:14 +02:00
David S. Miller ca3943c4aa linux-can-fixes-for-4.17-20180510
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEENrCndlB/VnAEWuH5k9IU1zQoZfEFAlr0dbcTHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRCT0hTXNChl8Vj0CAC3JNk7QXU+WIEwtdKZU8GW1z+gBtXb
 xpaidY91djsj/2L3xzgIF+gRL6BHFQnK+0ylqmHsk38QVijl6SsWp8LPOy35u1wN
 yGdlEZNvnajguWENUr8cnNAtkICa7b1JR6Eyqt8ZY5Ugns2G+js6tqX3FCxkpu2I
 ZRMheSJ6tQcw1SjTQgC6rhsYFipSxOEdqNzdLDo3K4Ttmb2osoHnBcVZllUwZeTu
 iSUktcCjrbv24JNkyf1HE5wt8X3zT5nnSiAPs6JW3xwcS/kw4QOsZsFx7JwoVssB
 oulQ1YGz/uT13rPh40MsPNyiG65BpQCCu3JKmgJGF4Kbmd8CO3NyzA17
 =G0cT
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-4.17-20180510' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
this is a pull request for net/master consisting of 2 patches.

Both patches are from Lukas Wunner and fix two problems found in the
hi311x CAN driver under high load situations.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-10 17:57:11 -04:00
Colin Ian King 2fdae0349f qed: fix spelling mistake: "taskelt" -> "tasklet"
Trivial fix to spelling mistake in DP_VERBOSE message text

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-10 17:55:55 -04:00