Commit Graph

650608 Commits

Author SHA1 Message Date
Florian Fainelli 648ea01340 net: phy: Allow pre-declaration of MDIO devices
Allow board support code to collect pre-declarations for MDIO devices by
registering them with mdiobus_register_board_info(). SPI and I2C buses
have a similar feature, we were missing this for MDIO devices, but this
is particularly useful for e.g: MDIO-connected switches which need to
provide their port layout (often board-specific) to a MDIO Ethernet
switch driver.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 10:51:46 -05:00
Florian Fainelli 71e0bbde0d net: dsa: Add support for platform data
Allow drivers to use the new DSA API with platform data. Most of the
code in net/dsa/dsa2.c does not rely so much on device_nodes and can get
the same information from platform_data instead.

We purposely do not support distributed configurations with platform
data, so drivers should be providing a pointer to a 'struct
dsa_chip_data' structure if they wish to communicate per-port layout.

Multiple CPUs port could potentially be supported and dsa_chip_data is
extended to receive up to one reference to an upstream network device
per port described by a dsa_chip_data structure.

dsa_dev_to_net_device() increments the network device's reference count,
so we intentionally call dev_put() to be consistent with the DT-enabled
path, until we have a generic notifier based solution.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 10:51:45 -05:00
Florian Fainelli 14b89f36ee net: dsa: Rename and export dev_to_net_device()
In preparation for using this function in net/dsa/dsa2.c, rename the function
to make its scope DSA specific, and export it.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 10:51:45 -05:00
Andrew Lunn a23b296198 net: dsa: mv88e6xxx: Refactor remaining port setup
Move the remaining port configuration code which varies per device
into port.c, using ops were necessary. This makes
mv88e6xxx_6185_family() and mv88e6xxx_6095_family() unused, so remove
them.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 10:48:06 -05:00
Andrew Lunn cf3e80df13 net: dsa: mv88e6xxx: Implement Clause 45 access to SMI devices
The mv88e6390 MDIO bus controllers can support for clause 45 accesses.
The internal SERDES interfaces need this, and it is likely external
10GHz PHYs will be clause 45.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 10:47:11 -05:00
David S. Miller 8661a631e1 Merge branch 'mv88e6390-CMODE'
Andrew Lunn says:

====================
Set the CMODE for mv88e6390 ports

The mv88e6390 ports 9 & 10 allow there CMODE to be set. CMODE is part
of what linux defines as phy-mode. Add the needed phy-modes to linux,
and add code which will act upon the phy-mode property to configure
the switch port.

These patches have been posted before as part of a bigger patchset
which has now been broken up. I've added the received reviewed by
tags, and added device tree documentation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 10:34:43 -05:00
Andrew Lunn f39908d3b1 net: dsa: mv88e6xxx: Set the CMODE for mv88e6390 ports 9 & 10
Unlike most ports, ports 9 and 10 of the 6390X family have configurable
PHY modes. Set the mode as part of adjust_link().

Ordering is important, because the SERDES interfaces connected to
ports 9 and 10 can be split and assigned to other ports. The CMODE has
to be correctly set before the SERDES interface on another port can be
configured. Such configuration is likely to be performed in
port_enable() and port_disabled(), called on slave_open() and
slave_close().

The simple case is port 9 and 10 are used for 'CPU' or 'DSA'. In this
case, the CMODE is set via a phy-mode in dsa_cpu_dsa_setup(), which is
called early in the switch setup.

When ports 9 or 10 are used as user ports, and have a fixed-phy, when
the fixed fixed-phy is attached, dsa_slave_adjust_link() is called,
which results in the adjust_link function being called, setting the
cmode. The port_enable() will for other ports will be called much
later.

When ports 9 or 10 are used as user ports and have a real phy attached
which does not use all the available SERDES interface, e.g. a 1Gbps
SGMII, there is currently no mechanism in place to set the CMODE of
the port from software. It must be hoped the stripping resistors are
correct.

At the same time, add a function to get the cmode. This will be needed
when configuring the SERDES interfaces.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 10:34:43 -05:00
Andrew Lunn 55601a8806 net: phy: Add 2000base-x, 2500base-x and rxaui modes
The mv88e6390 ports 9 and 10 supports some additional PHY modes. Add
these modes to the PHY core so they can be used in the binding.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 10:34:42 -05:00
David S. Miller 108d9c71dc Merge branch 'virtio_net-XDP-adjust_head'
John Fastabend says:

====================
XDP adjust head support for virtio

This series adds adjust head support for virtio. The following is my
test setup. I use qemu + virtio as follows,

./x86_64-softmmu/qemu-system-x86_64 \
  -hda /var/lib/libvirt/images/Fedora-test0.img \
  -m 4096  -enable-kvm -smp 2 -netdev tap,id=hn0,queues=4,vhost=on \
  -device virtio-net-pci,netdev=hn0,mq=on,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,vectors=9

In order to use XDP with virtio until LRO is supported TSO must be
turned off in the host. The important fields in the above command line
are the following,

  guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off

Also note it is possible to conusme more queues than can be supported
because when XDP is enabled for retransmit XDP attempts to use a queue
per cpu. My standard queue count is 'queues=4'.

After loading the VM I run the relevant XDP test programs in,

  ./sammples/bpf

For this series I tested xdp1, xdp2, and xdp_tx_iptunnel. I usually test
with iperf (-d option to get bidirectional traffic), ping, and pktgen.
I also have a modified xdp1 that returns XDP_PASS on any packet to ensure
the normal traffic path to the stack continues to work with XDP loaded.

It would be great to automate this soon. At the moment I do it by hand
which is starting to get tedious.

v2: original series dropped trace points after merge.
====================

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 10:05:13 -05:00
John Fastabend 2de2f7f40e virtio_net: XDP support for adjust_head
Add support for XDP adjust head by allocating a 256B header region
that XDP programs can grow into. This is only enabled when a XDP
program is loaded.

In order to ensure that we do not have to unwind queue headroom push
queue setup below bpf_prog_add. It reads better to do a prog ref
unwind vs another queue setup call.

At the moment this code must do a full reset to ensure old buffers
without headroom on program add or with headroom on program removal
are not used incorrectly in the datapath. Ideally we would only
have to disable/enable the RX queues being updated but there is no
API to do this at the moment in virtio so use the big hammer. In
practice it is likely not that big of a problem as this will only
happen when XDP is enabled/disabled changing programs does not
require the reset. There is some risk that the driver may either
have an allocation failure or for some reason fail to correctly
negotiate with the underlying backend in this case the driver will
be left uninitialized. I have not seen this ever happen on my test
systems and for what its worth this same failure case can occur
from probe and other contexts in virtio framework.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 10:05:12 -05:00
John Fastabend 9fe7bfce8b virtio_net: refactor freeze/restore logic into virtnet reset logic
For XDP we will need to reset the queues to allow for buffer headroom
to be configured. In order to do this we need to essentially run the
freeze()/restore() code path. Unfortunately the locking requirements
between the freeze/restore and reset paths are different however so
we can not simply reuse the code.

This patch refactors the code path and adds a reset helper routine.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 10:05:12 -05:00
John Fastabend 722d82830a virtio_net: remove duplicate queue pair binding in XDP
Factor out qp assignment.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 10:05:11 -05:00
John Fastabend 0354e4d19c virtio_net: factor out xdp handler for readability
At this point the do_xdp_prog is mostly if/else branches handling
the different modes of virtio_net. So remove it and handle running
the program in the per mode handlers.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 10:05:11 -05:00
John Fastabend 473153291b virtio_net: wrap rtnl_lock in test for calling with lock already held
For XDP use case and to allow ethtool reset tests it is useful to be
able to use reset paths from contexts where rtnl lock is already
held.

This requries updating virtnet_set_queues and free_receive_bufs the
two places where rtnl_lock is taken in virtio_net. To do this we
use the following pattern,

	_foo(...) { do stuff }
	foo(...) { rtnl_lock(); _foo(...); rtnl_unlock()};

this allows us to use freeze()/restore() flow from both contexts.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 10:05:11 -05:00
David S. Miller 152bff3776 Merge branch 'bridge-improve-cache-utilization'
Nikolay Aleksandrov says:

====================
bridge: improve cache utilization

This is the first set which begins to deal with the bad bridge cache
access patterns. The first patch rearranges the bridge and port structs
a little so the frequently (and closely) accessed members are in the same
cache line. The second patch then moves the garbage collection to a
workqueue trying to improve system responsiveness under load (many fdbs)
and more importantly removes the need to check if the matched entry is
expired in __br_fdb_get which was a major source of false-sharing.
The third patch is a preparation for the final one which
If properly configured, i.e. ports bound to CPUs (thus updating "updated"
locally) then the bridge's HitM goes from 100% to 0%, but even without
binding we get a win because previously every lookup that iterated over
the hash chain caused false-sharing due to the first cache line being
used for both mac/vid and used/updated fields.

Some results from tests I've run:
(note that these were run in good conditions for the baseline, everything
 ran on a single NUMA node and there were only 3 fdbs)

1. baseline
100% Load HitM on the fdbs (between everyone who has done lookups and hit
                            one of the 3 hash chains of the communicating
                            src/dst fdbs)
Overall 5.06% Load HitM for the bridge, first place in the list

2. patched & ports bound to CPUs
0% Local load HitM, bridge is not even in the c2c report list
Also there's 3% consistent improvement in netperf tests.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 22:53:14 -05:00
Nikolay Aleksandrov 83a718d629 bridge: fdb: write to used and updated at most once per jiffy
Writing once per jiffy is enough to limit the bridge's false sharing.
After this change the bridge doesn't show up in the local load HitM stats.

Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 22:53:13 -05:00
Nikolay Aleksandrov 1214628cb1 bridge: move write-heavy fdb members in their own cache line
Fdb's used and updated fields are written to on every packet forward and
packet receive respectively. Thus if we are receiving packets from a
particular fdb, they'll cause false-sharing with everyone who has looked
it up (even if it didn't match, since mac/vid share cache line!). The
"used" field is even worse since it is updated on every packet forward
to that fdb, thus the standard config where X ports use a single gateway
results in 100% fdb false-sharing. Note that this patch does not prevent
the last scenario, but it makes it better for other bridge participants
which are not using that fdb (and are only doing lookups over it).
The point is with this move we make sure that only communicating parties
get the false-sharing, in a later patch we'll show how to avoid that too.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 22:53:13 -05:00
Nikolay Aleksandrov f7cdee8a79 bridge: move to workqueue gc
Move the fdb garbage collector to a workqueue which fires at least 10
milliseconds apart and cleans chain by chain allowing for other tasks
to run in the meantime. When having thousands of fdbs the system is much
more responsive. Most importantly remove the need to check if the
matched entry has expired in __br_fdb_get that causes false-sharing and
is completely unnecessary if we cleanup entries, at worst we'll get 10ms
of traffic for that entry before it gets deleted.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 22:53:13 -05:00
Nikolay Aleksandrov 1f90c7f347 bridge: modify bridge and port to have often accessed fields in one cache line
Move around net_bridge so the vlan fields are in the beginning since
they're checked on every packet even if vlan filtering is disabled.
For the port move flags & vlan group to the beginning, so they're in the
same cache line with the port's state (both flags and state are checked
on each packet).

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 22:53:13 -05:00
William Tu 63dfef75ed bpf: enable verifier to add 0 to packet ptr
The patch fixes the case when adding a zero value to the packet
pointer.  The zero value could come from src_reg equals type
BPF_K or CONST_IMM.  The patch fixes both, otherwise the verifer
reports the following error:
  [...]
    R0=imm0,min_value=0,max_value=0
    R1=pkt(id=0,off=0,r=4)
    R2=pkt_end R3=fp-12
    R4=imm4,min_value=4,max_value=4
    R5=pkt(id=0,off=4,r=4)
  269: (bf) r2 = r0     // r2 becomes imm0
  270: (77) r2 >>= 3
  271: (bf) r4 = r1     // r4 becomes pkt ptr
  272: (0f) r4 += r2    // r4 += 0
  addition of negative constant to packet pointer is not allowed

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Mihai Budiu <mbudiu@vmware.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 22:50:04 -05:00
Josef Bacik 29200c199c bpf: test for AND edge cases
These two tests are based on the work done for f23cc643f9.  The first test is
just a basic one to make sure we don't allow AND'ing negative values, even if it
would result in a valid index for the array.  The second is a cleaned up version
of the original testcase provided by Jann Horn that resulted in the commit.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 22:35:58 -05:00
David S. Miller 9172d2a026 Merge branch 'dsa-add-fabric-notifier'
Vivien Didelot says:

====================
net: dsa: add fabric notifier

When a switch fabric is composed of multiple switch chips, these chips
must be programmed accordingly when an event occurred on one of them.

Examples of such event include hardware bridging: when a Linux bridge
spans interconnected chips, they must be programmed to allow external
ports to ingress frames on their internal ports.

Another example is cross-chip hardware VLANs. Switch chips in-between
interconnected bridge ports must also configure a given VLAN to allow
packets to pass through them.

In order to support that, this patchset introduces a non-intrusive
notifier mechanism. It adds a notifier head in every DSA switch tree
(the said fabric), and a notifier block in every DSA switch chip.

When an even occurs, it is chained to all notifiers of the fabric.
Switch chips can react accordingly if they are cross-chip capable.

On a dynamic debug enabled system, bridging a port in a multi-chip
fabric will print something like this (ZII Rev B board):

    # brctl addif br0 lan3
    mv88e6085 0.1:00: crosschip DSA port 1.0 bridged to br0
    mv88e6085 0.4:00: crosschip DSA port 1.0 bridged to br0
    # brctl delif br0 lan3
    mv88e6085 0.1:00: crosschip DSA port 1.0 unbridged from br0
    mv88e6085 0.4:00: crosschip DSA port 1.0 unbridged from br0

Currently only bridging events are added. A patchset introducing support
for cross-chip hardware bridging configuration in mv88e6xxx will follow
right after. Then events for switchdev operations are next on the line.

We should note that non-switchdev events do not support rolling-back
switch-wide operations. We'll have to work on closer integration with
switchdev for that, like introducing new attributes or objects, to
benefit from the prepare and commit phases.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 16:53:30 -05:00
Vivien Didelot 04d3a4c6af net: dsa: introduce bridge notifier
A slave device will now notify the switch fabric once its port is
bridged or unbridged, instead of calling directly its switch operations.

This code allows propagating cross-chip bridging events in the fabric.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 16:53:29 -05:00
Vivien Didelot f515f192ab net: dsa: add switch notifier
Add a notifier block per DSA switch, registered against a notifier head
in the switch fabric they belong to.

This infrastructure will allow to propagate fabric-wide events such as
port bridging, VLAN configuration, etc. If a DSA switch driver cares
about cross-chip configuration, such events can be caught.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 16:53:29 -05:00
Vivien Didelot c5d35cb32c net: dsa: change state setter scope
The scope of the functions inside net/dsa/slave.c must be the slave
net_device pointer. Change to state setter helper accordingly to
simplify callers.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 16:53:29 -05:00
Vivien Didelot 9c26542685 net: dsa: rollback bridging on error
When an error is returned during the bridging of a port in a
NETDEV_CHANGEUPPER event, net/core/dev.c rolls back the operation.

Be consistent and unassign dp->bridge_dev when this happens.

In the meantime, add comments to document this behavior.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 16:53:28 -05:00
Vivien Didelot 8e92ab3a42 net: dsa: simplify netdevice events handling
Simplify the code handling the slave netdevice notifier call by
providing a dsa_slave_changeupper helper for NETDEV_CHANGEUPPER, and so
on (only this event is supported at the moment.)

Return NOTIFY_DONE when we did not care about an event, and NOTIFY_OK
when we were concerned but no error occurred, as the API suggests.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 16:53:28 -05:00
Vivien Didelot 88e4f0ca4e net: dsa: move netdevice notifier registration
Move the netdevice notifier block register code in slave.c and provide
helpers for dsa.c to register and unregister it.

At the same time, check for errors since (un)register_netdevice_notifier
may fail.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 16:53:28 -05:00
Arnd Bergmann 321fa4ffd9 net/mlx5e: fix another maybe-uninitialized false-positive
In commit abeffce ("net/mlx5e: Fix a -Wmaybe-uninitialized warning"), I fixed a
gcc warning for the ipv4 offload handling. Now we get the same warning for the
added ipv6 support:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:815:40: warning: 'out_dev' may be used uninitialized in this function [-Wmaybe-uninitialized]

We can apply the same workaround here as well.

Fixes: ce99f6b97f ("net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 16:35:12 -05:00
Parav Pandit d0d7b10b05 net-next: treewide use is_vlan_dev() helper function.
This patch makes use of is_vlan_dev() function instead of flag
comparison which is exactly done by is_vlan_dev() helper function.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jon Maxwell <jmaxwell37@gmail.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 16:33:29 -05:00
Dan Carpenter 73cfb2a2e4 net/mlx4_en: fix a condition
There is a "||" vs "|" typo here so we test 0x1 instead of 0x6.

Fixes: 1f8176f735 ("net/mlx4_en: Check the enabling pptx/pprx flags in SET_PORT wrapper flow")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 12:01:06 -05:00
Bert Kenward f820c0ac6c sfc: don't rearm interrupts if busy polling
Since commit 364b605573 ("net: busy-poll: return busypolling status
to drivers"), napi_complete_done() returns a boolean that can be used
by drivers to conditionally rearm interrupts.

Testing with a 7142 shows a small latency improvement of ~100 ns.

Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:59:36 -05:00
Xin Long d15c9ede61 sctp: process fwd tsn chunk only when prsctp is enabled
This patch is to check if asoc->peer.prsctp_capable is set before
processing fwd tsn chunk, if not, it will return an ERROR to the
peer, just as rfc3758 section 3.3.1 demands.

Reported-by: Julian Cordes <julian.cordes@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:57:15 -05:00
David S. Miller 3bc32d0396 Merge branch 'mlxsw-cleanup-neigh-handling'
Jiri Pirko says:

====================
mlxsw: cleanup neigh handling

Ido says:

This series addresses long standing issues in the mlxsw driver
concerning neighbour reflection. It also prepares the code for follow-up
changes dealing with proper resource cleanup and nexthop reflection.

The first two patches convert the neighbour reflection code to use an
ordered workqueue, to prevent re-ordering of NEIGH_UPDATE events that
may happen following subsequent patches.

The third to fifth patches remove the ndo_neigh_{construct,destroy}
entry points from the driver, thereby relying only on NEIGH_UPDATE
events for neighbour reflection. This simplifies the code considerably.

Last patches are fallout and adjust nits in the code I noticed while
going over it.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:58 -05:00
Ido Schimmel fd76d9105b mlxsw: spectrum_router: Fix typo in comment
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:57 -05:00
Ido Schimmel 01b1aa359d mlxsw: spectrum_router: Don't read 'nud_state' without lock
We periodically ask the neighbouring system to try and resolve
neighbours that are used for nexthops, but aren't currently resolved.

However, 'nud_state' is protected by the neighbour lock, so we shouldn't
access it without taking it. Instead, we can simply check the
'connected' field of the neighbour entry, which we update upon
NEIGH_UPDATE events.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:57 -05:00
Ido Schimmel 8a0b727526 mlxsw: spectrum_router: Remove redundant check
We only add neighbour entries that are also used for nexthops to
'nexthop_neighs_list', so when iterating over this list there's no need
to check that the entry is indeed used for nexthops.

Remove the redundant check.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:57 -05:00
Ido Schimmel a8eca32615 net: remove ndo_neigh_{construct, destroy} from stacked devices
In commit 18bfb924f0 ("net: introduce default neigh_construct/destroy
ndo calls for L2 upper devices") we added these ndos to stacked devices
such as team and bond, so that calls will be propagated to mlxsw.

However, previous commit removed the reliance on these ndos and no new
users of these ndos have appeared since above mentioned commit. We can
therefore safely remove this dead code.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:57 -05:00
Ido Schimmel 5c8802f14a mlxsw: spectrum_router: Simplify neighbour reflection
Up until now we had two interfaces for neighbour related configuration:
ndo_neigh_{construct,destroy} and NEIGH_UPDATE netevents. The ndos were
used to add and remove neighbours from the driver's cache, whereas the
netevent was used to reflect the neighbours into the device's tables.

However, if the NUD state of a neighbour isn't NUD_VALID or if the
neighbour is dead, then there's really no reason for us to keep it
inside our cache. The only exception to this rule are neighbours that
are also used for nexthops, which we periodically refresh to get them
resolved.

We can therefore eliminate the ndo entry point into the driver and
simplify the code, making it similar to the FIB reflection, which is
based solely on events. This also helps us avoid a locking issue, in
which the RIF cache was traversed without proper locking during
insertion into the neigh entry cache.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:56 -05:00
Ido Schimmel de04b6a358 mlxsw: spectrum_router: Remove unused variable
Since commit 33b1341cd1 ("mlxsw: spectrum_router: Fix handling of
neighbour structure") we no longer use destination IP for neighbour
lookup, so remove it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:56 -05:00
Ido Schimmel e60234ddb5 mlxsw: spectrum_router: Use ordered workqueue for neigh updates
We currently associate each neighbour entry with a work item, so it's
not possible to have multiple events queued for the same neighbour
entry. However, this is about to be changed so that the neighbour entry
is only resolved when the work item is scheduled.

The above can result in a mismatch between the kernel's and the device's
neighbour table, unless the associated work items are processed in the
order in which they were submitted.

Do that by migrating the NEIGH_UPDATE work items to be processed in the
ordered workqueue which was recently introduced in mlxsw in commit
a3832b3189 ("mlxsw: core: Create an ordered workqueue for FIB
offload").

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:56 -05:00
Ido Schimmel a0e4761d9b mlxsw: core: Queue work immediately instead of delaying it
We always use zero delay before queueing a work on the ordered workqueue
('mlxsw_owq'), so use work_struct directly instead of delayable work.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:55 -05:00
David S. Miller fcdc103dac linux-can-next-for-4.11-20170206
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEE4bay/IylYqM/npjQHv7KIOw4HPYFAliYiW8THG1rbEBwZW5n
 dXRyb25peC5kZQAKCRAe/sog7Dgc9jdWCACU/2SxD8WcIcP5HZgJMxv9K9LyRC++
 qQb1SQKUiedAF0173IQAVawe105LEdZS08o8ovNCzcU0gLHmfAKysjXEnWVqpyLA
 9zOBkl5XzjFsi9KI1TVsn/pUlk3yCq8w2azmt3mi/G5yLoAE0G46pzjbpOYyz67d
 Gyoh4i9OhsQOmgpefkwtyPxlh5Xd28ohdPv8NCBb17No+1Dfs4qji1TTyVY9sqNr
 8AwRl9BmTqDvZMUPRsdXk+XqZyITGLt+kMa80hd+XFmNkoWy6EKMbFiuVjkpZqtL
 kX/Tfe5/mEiBegBvIUVkeOrA7MryNLOf0ZO+bLdNnSTJZJyIro1kAyIc
 =kgvC
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-next-for-4.11-20170206' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2017-02-06

this is a pull request of 16 patches for net-next/master.

The first two patches by David Jander and me add the rx-offload
framework for CAN devices to the kernel. The remaining 14 patches
convert the flexcan driver to make use of it.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:08:51 -05:00
Elad Raz e158e5ef24 mlxsw: reg: Fix HTGT register length
HTGT register length is limited to 32 bytes and not 256 bytes.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:07:21 -05:00
Jingju Hou b60a00f9c5 net: mvneta: implement .set_wol and .get_wol
The mvneta itself does not support WOL, but the PHY might.
So pass the calls to the PHY

Signed-off-by: Jingju Hou <houjingj@marvell.com>
Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 10:54:02 -05:00
Marc Kleine-Budde 096de07f1d can: flexcan: switch imx6 and vf610 to timestamp based offloading
This patch switches the imx6 and vf610 based SoCs from the hardware FIFO
to the timestamp based rx offloading.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-02-06 15:13:45 +01:00
Marc Kleine-Budde b3cf53e988 can: flexcan: add support for timestamp based rx-offload
The flexcan IP core has 64 mailboxes. For now they are configured for
RX as a hardware FIFO. This FIFO has a fixed depth of 6 CAN frames. In
some high load scenarios it turns out thas this buffer is too small.

In order to have a buffer larger than the 6 frames FIFO, this patch adds
support for timestamp based offloading via the generic rx-offload
infrastructure.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-02-06 15:13:44 +01:00
Marc Kleine-Budde 9eb7aa8911 can: flexcan: add quirk FLEXCAN_QUIRK_ENABLE_EACEN_RRS
In order to receive RTR frames in the non HW FIFO mode the RSS and EACEN bits
of the reg_ctrl2 have to be activated. As this has no side effect in the FIFO
mode, we do this unconditionally on cores with the reg_ctrl2.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-02-06 15:13:42 +01:00
Marc Kleine-Budde 4bd888a80b can: flexcan: activate individual RX masking and initialize reg_rximr
Modern flexcan IP cores support two RX modes. One is using the 6 fames deep
hardware FIFO, the other is using up to 64 mailboxes (in non FIFO mode). For
now only the HW FIFO mode is activated.

In order to make use of the RX mailboxes the individual RX masking feature has
to be activated, otherwise matching mailboxes are overwritten during the
reception process. This however switches on the individual RX masking, which
uses reg_rximr registers for masking.

This patch activates the individual RX masking feature unconditionally and
initializes the mask registers (reg_rximr) with 0x0 == "don't care", which
switches off any filtering.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-02-06 15:13:41 +01:00
Marc Kleine-Budde 30164759db can: flexcan: make use of rx-offload's irq_offload_fifo
This patch converts the flexcan driver to make use of the rx-offload
can_rx_offload_irq_offload_fifo() helper function. The idea is to read
the CAN frames already in the interrupt context, as the depth of the
flexcan HW FIFO is too shallow, resulting in too many missed frames.
During a normal NAPI poll the frames are the pushed into the upper
layers.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-02-06 15:13:39 +01:00