Commit Graph

1102 Commits

Author SHA1 Message Date
Vladimir Oltean 89153ed6eb net: dsa: propagate extack to .port_vlan_filtering
Some drivers can't dynamically change the VLAN filtering option, or
impose some restrictions, it would be nice to propagate this info
through netlink instead of printing it to a kernel log that might never
be read. Also netlink extack includes the module that emitted the
message, which means that it's easier to figure out which ones are
driver-generated errors as opposed to command misuse.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 17:38:12 -08:00
Vladimir Oltean 31046a5fd9 net: dsa: propagate extack to .port_vlan_add
Allow drivers to communicate their restrictions to user space directly,
instead of printing to the kernel log. Where the conversion would have
been lossy and things like VLAN ID could no longer be conveyed (due to
the lack of support for printf format specifier in netlink extack), I
chose to keep the messages in full form to the kernel log only, and
leave it up to individual driver maintainers to move more messages to
extack.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 17:38:11 -08:00
Vladimir Oltean 0a6f17c6ae net: dsa: tag_ocelot_8021q: add support for PTP timestamping
For TX timestamping, we use the felix_txtstamp method which is common
with the regular (non-8021q) ocelot tagger. This method says that skb
deferral is needed, prepares a timestamp request ID, and puts a clone of
the skb in a queue waiting for the timestamp IRQ.

felix_txtstamp is called by dsa_skb_tx_timestamp() just before the
tagger's xmit method. In the tagger xmit, we divert the packets
classified by dsa_skb_tx_timestamp() as PTP towards the MMIO-based
injection registers, and we declare them as dead towards dsa_slave_xmit.
If not PTP, we proceed with normal tag_8021q stuff.

Then the timestamp IRQ fires, the clone queued up from felix_txtstamp is
matched to the TX timestamp retrieved from the switch's FIFO based on
the timestamp request ID, and the clone is delivered to the stack.

On RX, thanks to the VCAP IS2 rule that redirects the frames with an
EtherType for 1588 towards two destinations:
- the CPU port module (for MMIO based extraction) and
- if the "no XTR IRQ" workaround is in place, the dsa_8021q CPU port
the relevant data path processing starts in the ptp_classify_raw BPF
classifier installed by DSA in the RX data path (post tagger, which is
completely unaware that it saw a PTP packet).

This time we can't reuse the same implementation of .port_rxtstamp that
also works with the default ocelot tagger. That is because felix_rxtstamp
is given an skb with a freshly stripped DSA header, and it says "I don't
need deferral for its RX timestamp, it's right in it, let me show you";
and it just points to the header right behind skb->data, from where it
unpacks the timestamp and annotates the skb with it.

The same thing cannot happen with tag_ocelot_8021q, because for one
thing, the skb did not have an extraction frame header in the first
place, but a VLAN tag with no timestamp information. So the code paths
in felix_rxtstamp for the regular and 8021q tagger are completely
independent. With tag_8021q, the timestamp must come from the packet's
duplicate delivered to the CPU port module, but there is potentially
complex logic to be handled [ and prone to reordering ] if we were to
just start reading packets from the CPU port module, and try to match
them to the one we received over Ethernet and which needs an RX
timestamp. So we do something simple: we tell DSA "give me some time to
think" (we request skb deferral by returning false from .port_rxtstamp)
and we just drop the frame we got over Ethernet with no attempt to match
it to anything - we just treat it as a notification that there's data to
be processed from the CPU port module's queues. Then we proceed to read
the packets from those, one by one, which we deliver up the stack,
timestamped, using netif_rx - the same function that any driver would
use anyway if it needed RX timestamp deferral. So the assumption is that
we'll come across the PTP packet that triggered the CPU extraction
notification eventually, but we don't know when exactly. Thanks to the
VCAP IS2 trap/redirect rule and the exclusion of the CPU port module
from the flooding replicators, only PTP frames should be present in the
CPU port module's RX queues anyway.

There is just one conflict between the VCAP IS2 trapping rule and the
semantics of the BPF classifier. Namely, ptp_classify_raw() deems
general messages as non-timestampable, but still, those are trapped to
the CPU port module since they have an EtherType of ETH_P_1588. So, if
the "no XTR IRQ" workaround is in place, we need to run another BPF
classifier on the frames extracted over MMIO, to avoid duplicates being
sent to the stack (once over Ethernet, once over MMIO). It doesn't look
like it's possible to install VCAP IS2 rules based on keys extracted
from the 1588 frame headers.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 17:31:44 -08:00
Vladimir Oltean c8c0ba4fe2 net: dsa: felix: setup MMIO filtering rules for PTP when using tag_8021q
Since the tag_8021q tagger is software-defined, it has no means by
itself for retrieving hardware timestamps of PTP event messages.

Because we do want to support PTP on ocelot even with tag_8021q, we need
to use the CPU port module for that. The RX timestamp is present in the
Extraction Frame Header. And because we can't use NPI mode which redirects
the CPU queues to an "external CPU" (meaning the ARM CPU running Linux),
then we need to poll the CPU port module through the MMIO registers to
retrieve TX and RX timestamps.

Sadly, on NXP LS1028A, the Felix switch was integrated into the SoC
without wiring the extraction IRQ line to the ARM GIC. So, if we want to
be notified of any PTP packets received on the CPU port module, we have
a problem.

There is a possible workaround, which is to use the Ethernet CPU port as
a notification channel that packets are available on the CPU port module
as well. When a PTP packet is received by the DSA tagger (without timestamp,
of course), we go to the CPU extraction queues, poll for it there, then
we drop the original Ethernet packet and masquerade the packet retrieved
over MMIO (plus the timestamp) as the original when we inject it up the
stack.

Create a quirk in struct felix is selected by the Felix driver (but not
by Seville, since that doesn't support PTP at all). We want to do this
such that the workaround is minimally invasive for future switches that
don't require this workaround.

The only traffic for which we need timestamps is PTP traffic, so add a
redirection rule to the CPU port module for this. Currently we only have
the need for PTP over L2, so redirection rules for UDP ports 319 and 320
are TBD for now.

Note that for the workaround of matching of PTP-over-Ethernet-port with
PTP-over-MMIO queues to work properly, both channels need to be
absolutely lossless. There are two parts to achieving that:
- We keep flow control enabled on the tag_8021q CPU port
- We put the DSA master interface in promiscuous mode, so it will never
  drop a PTP frame (for the profiles we are interested in, these are
  sent to the multicast MAC addresses of 01-80-c2-00-00-0e and
  01-1b-19-00-00-00).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 17:31:44 -08:00
Vladimir Oltean 7c4bb540e9 net: dsa: tag_ocelot: create separate tagger for Seville
The ocelot tagger is a hot mess currently, it relies on memory
initialized by the attached driver for basic frame transmission.
This is against all that DSA tagging protocols stand for, which is that
the transmission and reception of a DSA-tagged frame, the data path,
should be independent from the switch control path, because the tag
protocol is in principle hot-pluggable and reusable across switches
(even if in practice it wasn't until very recently). But if another
driver like dsa_loop wants to make use of tag_ocelot, it couldn't.

This was done to have common code between Felix and Ocelot, which have
one bit difference in the frame header format. Quoting from commit
67c2404922 ("net: dsa: felix: create a template for the DSA tags on
xmit"):

    Other alternatives have been analyzed, such as:
    - Create a separate tag_seville.c: too much code duplication for just 1
      bit field difference.
    - Create a separate DSA_TAG_PROTO_SEVILLE under tag_ocelot.c, just like
      tag_brcm.c, which would have a separate .xmit function. Again, too
      much code duplication for just 1 bit field difference.
    - Allocate the template from the init function of the tag_ocelot.c
      module, instead of from the driver: couldn't figure out a method of
      accessing the correct port template corresponding to the correct
      tagger in the .xmit function.

The really interesting part is that Seville should have had its own
tagging protocol defined - it is not compatible on the wire with Ocelot,
even for that single bit. In principle, a packet generated by
DSA_TAG_PROTO_OCELOT when booted on NXP LS1028A would look in a certain
way, but when booted on NXP T1040 it would look differently. The reverse
is also true: a packet generated by a Seville switch would be
interpreted incorrectly by Wireshark if it was told it was generated by
an Ocelot switch.

Actually things are a bit more nuanced. If we concentrate only on the
DSA tag, what I said above is true, but Ocelot/Seville also support an
optional DSA tag prefix, which can be short or long, and it is possible
to distinguish the two taggers based on an integer constant put in that
prefix. Nonetheless, creating a separate tagger is still justified,
since the tag prefix is optional, and without it, there is again no way
to distinguish.

Claiming backwards binary compatibility is a bit more tough, since I've
already changed the format of tag_ocelot once, in commit 5124197ce5
("net: dsa: tag_ocelot: use a short prefix on both ingress and egress").
Therefore I am not very concerned with treating this as a bugfix and
backporting it to stable kernels (which would be another mess due to the
fact that there would be lots of conflicts with the other DSA_TAG_PROTO*
definitions). It's just simpler to say that the string values of the
taggers have ABI value starting with kernel 5.12, which will be when the
changing of tag protocol via /sys/class/net/<dsa-master>/dsa/tagging
goes live.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 17:31:44 -08:00
Vladimir Oltean 62bf5fde5e net: dsa: tag_ocelot: single out PTP-related transmit tag processing
There is one place where we cannot avoid accessing driver data, and that
is 2-step PTP TX timestamping, since the switch wants us to provide a
timestamp request ID through the injection header, which naturally must
come from a sequence number kept by the driver (it is generated by the
.port_txtstamp method prior to the tagger's xmit).

However, since other drivers like dsa_loop do not claim PTP support
anyway, the DSA_SKB_CB(skb)->clone will always be NULL anyway, so if we
move all PTP-related dereferences of struct ocelot and struct ocelot_port
into a separate function, we can effectively ensure that this is dead
code when the ocelot tagger is attached to non-ocelot switches, and the
stateful portion of the tagger is more self-contained.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 17:31:44 -08:00
Vladimir Oltean 40d3f295b5 net: mscc: ocelot: use common tag parsing code with DSA
The Injection Frame Header and Extraction Frame Header that the switch
prepends to frames over the NPI port is also prepended to frames
delivered over the CPU port module's queues.

Let's unify the handling of the frame headers by making the ocelot
driver call some helpers exported by the DSA tagger. Among other things,
this allows us to get rid of the strange cpu_to_be32 when transmitting
the Injection Frame Header on ocelot, since the packing API uses
network byte order natively (when "quirks" is 0).

The comments above ocelot_gen_ifh talk about setting pop_cnt to 3, and
the cpu extraction queue mask to something, but the code doesn't do it,
so we don't do it either.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 17:31:44 -08:00
Vladimir Oltean 8a678bb29b net: dsa: tag_ocelot: avoid accessing ds->priv in ocelot_rcv
Taggers should be written to do something valid irrespective of the
switch driver that they are attached to. This is even more true now,
because since the introduction of the .change_tag_protocol method, a
certain tagger is not necessarily strictly associated with a driver any
longer, and I would like to be able to test all taggers with dsa_loop in
the future.

In the case of ocelot, it needs to move the classified VLAN from the DSA
tag into the skb if the port is VLAN-aware. We can allow it to do that
by looking at the dp->vlan_filtering property, no need to invoke
structures which are specific to ocelot.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 17:31:44 -08:00
Vladimir Oltean a8b659e7ff net: dsa: act as passthrough for bridge port flags
There are multiple ways in which a PORT_BRIDGE_FLAGS attribute can be
expressed by the bridge through switchdev, and not all of them can be
emulated by DSA mid-layer API at the same time.

One possible configuration is when the bridge offloads the port flags
using a mask that has a single bit set - therefore only one feature
should change. However, DSA currently groups together unicast and
multicast flooding in the .port_egress_floods method, which limits our
options when we try to add support for turning off broadcast flooding:
do we extend .port_egress_floods with a third parameter which b53 and
mv88e6xxx will ignore? But that means that the DSA layer, which
currently implements the PRE_BRIDGE_FLAGS attribute all by itself, will
see that .port_egress_floods is implemented, and will report that all 3
types of flooding are supported - not necessarily true.

Another configuration is when the user specifies more than one flag at
the same time, in the same netlink message. If we were to create one
individual function per offloadable bridge port flag, we would limit the
expressiveness of the switch driver of refusing certain combinations of
flag values. For example, a switch may not have an explicit knob for
flooding of unknown multicast, just for flooding in general. In that
case, the only correct thing to do is to allow changes to BR_FLOOD and
BR_MCAST_FLOOD in tandem, and never allow mismatched values. But having
a separate .port_set_unicast_flood and .port_set_multicast_flood would
not allow the driver to possibly reject that.

Also, DSA doesn't consider it necessary to inform the driver that a
SWITCHDEV_ATTR_ID_BRIDGE_MROUTER attribute was offloaded, because it
just calls .port_egress_floods for the CPU port. When we'll add support
for the plain SWITCHDEV_ATTR_ID_PORT_MROUTER, that will become a real
problem because the flood settings will need to be held statefully in
the DSA middle layer, otherwise changing the mrouter port attribute will
impact the flooding attribute. And that's _assuming_ that the underlying
hardware doesn't have anything else to do when a multicast router
attaches to a port than flood unknown traffic to it.  If it does, there
will need to be a dedicated .port_set_mrouter anyway.

So we need to let the DSA drivers see the exact form that the bridge
passes this switchdev attribute in, otherwise we are standing in the
way. Therefore we also need to use this form of language when
communicating to the driver that it needs to configure its initial
(before bridge join) and final (after bridge leave) port flags.

The b53 and mv88e6xxx drivers are converted to the passthrough API and
their implementation of .port_egress_floods is split into two: a
function that configures unicast flooding and another for multicast.
The mv88e6xxx implementation is quite hairy, and it turns out that
the implementations of unknown unicast flooding are actually the same
for 6185 and for 6352:

behind the confusing names actually lie two individual bits:
NO_UNKNOWN_MC -> FLOOD_UC = 0x4 = BIT(2)
NO_UNKNOWN_UC -> FLOOD_MC = 0x8 = BIT(3)

so there was no reason to entangle them in the first place.

Whereas the 6185 writes to MV88E6185_PORT_CTL0_FORWARD_UNKNOWN of
PORT_CTL0, which has the exact same bit index. I have left the
implementations separate though, for the only reason that the names are
different enough to confuse me, since I am not able to double-check with
a user manual. The multicast flooding setting for 6185 is in a different
register than for 6352 though.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:08:04 -08:00
Vladimir Oltean e18f4c18ab net: switchdev: pass flags and mask to both {PRE_,}BRIDGE_FLAGS attributes
This switchdev attribute offers a counterproductive API for a driver
writer, because although br_switchdev_set_port_flag gets passed a
"flags" and a "mask", those are passed piecemeal to the driver, so while
the PRE_BRIDGE_FLAGS listener knows what changed because it has the
"mask", the BRIDGE_FLAGS listener doesn't, because it only has the final
value. But certain drivers can offload only certain combinations of
settings, like for example they cannot change unicast flooding
independently of multicast flooding - they must be both on or both off.
The way the information is passed to switchdev makes drivers not
expressive enough, and unable to reject this request ahead of time, in
the PRE_BRIDGE_FLAGS notifier, so they are forced to reject it during
the deferred BRIDGE_FLAGS attribute, where the rejection is currently
ignored.

This patch also changes drivers to make use of the "mask" field for edge
detection when possible.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:08:04 -08:00
Vladimir Oltean 5e38c15856 net: dsa: configure better brport flags when ports leave the bridge
For a DSA switch port operating in standalone mode, address learning
doesn't make much sense since that is a bridge function. In fact,
address learning even breaks setups such as this one:

   +---------------------------------------------+
   |                                             |
   | +-------------------+                       |
   | |        br0        |    send      receive  |
   | +--------+-+--------+ +--------+ +--------+ |
   | |        | |        | |        | |        | |
   | |  swp0  | |  swp1  | |  swp2  | |  swp3  | |
   | |        | |        | |        | |        | |
   +-+--------+-+--------+-+--------+-+--------+-+
          |         ^           |          ^
          |         |           |          |
          |         +-----------+          |
          |                                |
          +--------------------------------+

because if the switch has a single FDB (can offload a single bridge)
then source address learning on swp3 can "steal" the source MAC address
of swp2 from br0's FDB, because learning frames coming from swp2 will be
done twice: first on the swp1 ingress port, second on the swp3 ingress
port. So the hardware FDB will become out of sync with the software
bridge, and when swp2 tries to send one more packet towards swp1, the
ASIC will attempt to short-circuit the forwarding path and send it
directly to swp3 (since that's the last port it learned that address on),
which it obviously can't, because swp3 operates in standalone mode.

So DSA drivers operating in standalone mode should still configure a
list of bridge port flags even when they are standalone. Currently DSA
attempts to call dsa_port_bridge_flags with 0, which disables egress
flooding of unknown unicast and multicast, something which doesn't make
much sense. For the switches that implement .port_egress_floods - b53
and mv88e6xxx, it probably doesn't matter too much either, since they
can possibly inject traffic from the CPU into a standalone port,
regardless of MAC DA, even if egress flooding is turned off for that
port, but certainly not all DSA switches can do that - sja1105, for
example, can't. So it makes sense to use a better common default there,
such as "flood everything".

It should also be noted that what DSA calls "dsa_port_bridge_flags()"
is a degenerate name for just calling .port_egress_floods(), since
nothing else is implemented - not learning, in particular. But disabling
address learning, something that this driver is also coding up for, will
be supported by individual drivers once .port_egress_floods is replaced
with a more generic .port_bridge_flags.

Previous attempts to code up this logic have been in the common bridge
layer, but as pointed out by Ido Schimmel, there are corner cases that
are missed when doing that:
https://patchwork.kernel.org/project/netdevbpf/patch/20210209151936.97382-5-olteanv@gmail.com/

So, at least for now, let's leave DSA in charge of setting port flags
before and after the bridge join and leave.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:08:04 -08:00
Vladimir Oltean 4c08c586ff net: switchdev: propagate extack to port attributes
When a struct switchdev_attr is notified through switchdev, there is no
way to report informational messages, unlike for struct switchdev_obj.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:08:04 -08:00
George McCollister bd62e6f5e6 net: dsa: xrs700x: add HSR offloading support
Add offloading for HSR/PRP (IEC 62439-3) tag insertion, tag removal
forwarding and duplication supported by the xrs7000 series switches.

Only HSR v1 and PRP v1 are supported by the xrs7000 series switches (HSR
v0 is not).

Signed-off-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-11 13:24:45 -08:00
George McCollister 18596f504a net: dsa: add support for offloading HSR
Add support for offloading of HSR/PRP (IEC 62439-3) tag insertion
tag removal, duplicate generation and forwarding on DSA switches.

Add DSA_NOTIFIER_HSR_JOIN and DSA_NOTIFIER_HSR_LEAVE which trigger calls
to .port_hsr_join and .port_hsr_leave in the DSA driver for the switch.

The DSA switch driver should then set netdev feature flags for the
HSR/PRP operation that it offloads.
    NETIF_F_HW_HSR_TAG_INS
    NETIF_F_HW_HSR_TAG_RM
    NETIF_F_HW_HSR_FWD
    NETIF_F_HW_HSR_DUP

Signed-off-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-11 13:24:45 -08:00
David S. Miller dc9d87581d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-02-10 13:30:12 -08:00
Vladimir Oltean a324d3d48f net: dsa: make assisted_learning_on_cpu_port bypass offloaded LAG interfaces
Given the following topology, and focusing only on Box A:

         Box A
         +----------------------------------+
         | Board 1         br0              |
         |             +---------+          |
         |            /           \         |
         |            |           |         |
         |            |         bond0       |
         |            |        +-----+      |
         |192.168.1.1 |       /       \     |
         |  eno0     swp0    swp1    swp2   |
         +---|--------|-------|-------|-----+
             |        |       |       |
             +--------+       |       |
               Cable          |       |
                         Cable|       |Cable
               Cable          |       |
             +--------+       |       |
             |        |       |       |
         +---|--------|-------|-------|-----+
         |  eno0     swp0    swp1    swp2   |
         |192.168.1.2 |       \       /     |
         |            |        +-----+      |
         |            |         bond0       |
         |            |           |         |
         |            \           /         |
         |             +---------+          |
         | Board 2         br0              |
         +----------------------------------+
         Box B

The assisted_learning_on_cpu_port logic will see that swp0 is bridged
with a "foreign interface" (bond0) and will therefore install all
addresses learnt by the software bridge towards bond0 (including the
address of eno0 on Box B) as static addresses towards the CPU port.

But that's not what we want - bond0 is not really a "foreign interface"
but one we can offload including L2 forwarding from/towards it. So we
need to refine our logic for assisted learning such that, whenever we
see an address learnt on a non-DSA interface, we search through the tree
for any port that offloads that non-DSA interface.

Some confusion might arise as to why we search through the whole tree
instead of just the local switch returned by dsa_slave_dev_lower_find.
Or a different angle of the same confusion: why does
dsa_slave_dev_lower_find(br_dev) return a single dp that's under br_dev
instead of the whole list of bridged DSA ports?

To answer the second question, it should be enough to install the static
FDB entry on the CPU port of a single switch in the tree, because
dsa_port_fdb_add uses DSA_NOTIFIER_FDB_ADD which ensures that all other
switches in the tree get notified of that address, and add the entry
themselves using dsa_towards_port().

This should help understand the answer to the first question: the port
returned by dsa_slave_dev_lower_find may not be on the same switch as
the ports that offload the LAG. Nonetheless, if the driver implements
.crosschip_lag_join and .crosschip_bridge_join as mv88e6xxx does, there
still isn't any reason for trapping addresses learnt on the remote LAG
towards the CPU, and we should prevent that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-06 14:51:51 -08:00
Vladimir Oltean c0a8a9c274 net: dsa: automatically bring user ports down when master goes down
This is not fixing any actual bug that I know of, but having a DSA
interface that is up even when its lower (master) interface is down is
one of those things that just do not sound right.

Yes, DSA checks if the master is up before actually bringing the
user interface up, but nobody prevents bringing the master interface
down immediately afterwards... Then the user ports would attempt
dev_queue_xmit on an interface that is down, and wonder what's wrong.

This patch prevents that from happening. NETDEV_GOING_DOWN is the
notification emitted _before_ the master actually goes down, and we are
protected by the rtnl_mutex, so all is well.

For those of you reading this because you were doing switch testing
such as latency measurements for autonomously forwarded traffic, and you
needed a controlled environment with no extra packets sent by the
network stack, this patch breaks that, because now the user ports go
down too, which may shut down the PHY etc. But please don't do it like
that, just do instead:

tc qdisc add dev eno2 clsact
tc filter add dev eno2 egress flower action drop

Tested with two cascaded DSA switches:
$ ip link set eno2 down
sja1105 spi2.0 sw0p2: Link is Down
mscc_felix 0000:00:00.5 swp0: Link is Down
fsl_enetc 0000:00:00.2 eno2: Link is Down

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-06 14:42:57 -08:00
Vladimir Oltean 9d5ef190e5 net: dsa: automatically bring up DSA master when opening user port
DSA wants the master interface to be open before the user port is due to
historical reasons. The promiscuity of interfaces that are down used to
have issues, as referenced Lennert Buytenhek in commit df02c6ff2e
("dsa: fix master interface allmulti/promisc handling").

The bugfix mentioned there, commit b6c40d68ff ("net: only invoke
dev->change_rx_flags when device is UP"), was basically a "don't do
that" approach to working around the promiscuity while down issue.

Further work done by Vlad Yasevich in commit d2615bf450 ("net: core:
Always propagate flag changes to interfaces") has resolved the
underlying issue, and it is strictly up to the DSA and 8021q drivers
now, it is no longer mandated by the networking core that the master
interface must be up when changing its promiscuity.

From DSA's point of view, deciding to error out in dsa_slave_open
because the master isn't up is
(a) a bad user experience and
(b) knocking at an open door.
Even if there still was an issue with promiscuity while down, DSA could
still just open the master and avoid it.

Doing it this way has the additional benefit that user space can now
remove DSA-specific workarounds, like systemd-networkd with BindCarrier:
https://github.com/systemd/systemd/issues/7478

And we can finally remove one of the 2 bullets in the "Common pitfalls
using DSA setups" chapter.

Tested with two cascaded DSA switches:

$ ip link set sw0p2 up
fsl_enetc 0000:00:00.2 eno2: configuring for fixed/internal link mode
fsl_enetc 0000:00:00.2 eno2: Link is Up - 1Gbps/Full - flow control rx/tx
mscc_felix 0000:00:00.5 swp0: configuring for fixed/sgmii link mode
mscc_felix 0000:00:00.5 swp0: Link is Up - 1Gbps/Full - flow control off
8021q: adding VLAN 0 to HW filter on device swp0
sja1105 spi2.0 sw0p2: configuring for phy/rgmii-id link mode
IPv6: ADDRCONF(NETDEV_CHANGE): eno2: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): swp0: link becomes ready

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-06 14:42:57 -08:00
Vladimir Oltean 8fd54a73b7 net: dsa: call teardown method on probe failure
Since teardown is supposed to undo the effects of the setup method, it
should be called in the error path for dsa_switch_setup, not just in
dsa_switch_teardown.

Fixes: 5e3f847a02 ("net: dsa: Add teardown callback for drivers")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210204163351.2929670-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-04 20:22:00 -08:00
Vladimir Oltean 99b8202b17 net: dsa: fix SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING getting ignored
The bridge emits VLAN filtering events and quite a few others via
switchdev with orig_dev = br->dev. After the blamed commit, these events
started getting ignored.

The point of the patch was to not offload switchdev objects for ports
that didn't go through dsa_port_bridge_join, because the configuration
is unsupported:
- ports that offload a bonding/team interface go through
  dsa_port_bridge_join when that bonding/team interface is later bridged
  with another switch port or LAG
- ports that don't offload LAG don't get notified of the bridge that is
  on top of that LAG.

Sadly, a check is missing, which is that the orig_dev is equal to the
bridge device. This check is compatible with the original intention,
because ports that don't offload bridging because they use a software
LAG don't have dp->bridge_dev set.

On a semi-related note, we should not offload switchdev objects or
populate dp->bridge_dev if the driver doesn't implement .port_bridge_join
either. However there is no regression associated with that, so it can
be done separately.

Fixes: 5696c8aedf ("net: dsa: Don't offload port attributes on standalone ports")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
Tested-by: Tobias Waldekranz <tobias@waldekranz.com>
Link: https://lore.kernel.org/r/20210202233109.1591466-1-olteanv@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-03 18:25:49 -08:00
Vladimir Oltean 7c83a7c539 net: dsa: add a second tagger for Ocelot switches based on tag_8021q
There are use cases for which the existing tagger, based on the NPI
(Node Processor Interface) functionality, is insufficient.

Namely:
- Frames injected through the NPI port bypass the frame analyzer, so no
  source address learning is performed, no TSN stream classification,
  etc.
- Flow control is not functional over an NPI port (PAUSE frames are
  encapsulated in the same Extraction Frame Header as all other frames)
- There can be at most one NPI port configured for an Ocelot switch. But
  in NXP LS1028A and T1040 there are two Ethernet CPU ports. The non-NPI
  port is currently either disabled, or operated as a plain user port
  (albeit an internally-facing one). Having the ability to configure the
  two CPU ports symmetrically could pave the way for e.g. creating a LAG
  between them, to increase bandwidth seamlessly for the system.

So there is a desire to have an alternative to the NPI mode. This change
keeps the default tagger for the Seville and Felix switches as "ocelot",
but it can be changed via the following device attribute:

echo ocelot-8021q > /sys/class/<dsa-master>/dsa/tagging

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 21:25:27 -08:00
Vladimir Oltean 53da0ebaad net: dsa: allow changing the tag protocol via the "tagging" device attribute
Currently DSA exposes the following sysfs:
$ cat /sys/class/net/eno2/dsa/tagging
ocelot

which is a read-only device attribute, introduced in the kernel as
commit 98cdb48071 ("net: dsa: Expose tagging protocol to user-space"),
and used by libpcap since its commit 993db3800d7d ("Add support for DSA
link-layer types").

It would be nice if we could extend this device attribute by making it
writable:
$ echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging

This is useful with DSA switches that can make use of more than one
tagging protocol. It may be useful in dsa_loop in the future too, to
perform offline testing of various taggers, or for changing between dsa
and edsa on Marvell switches, if that is desirable.

In terms of implementation, drivers can support this feature by
implementing .change_tag_protocol, which should always leave the switch
in a consistent state: either with the new protocol if things went well,
or with the old one if something failed. Teardown of the old protocol,
if necessary, must be handled by the driver.

Some things remain as before:
- The .get_tag_protocol is currently only called at probe time, to load
  the initial tagging protocol driver. Nonetheless, new drivers should
  report the tagging protocol in current use now.
- The driver should manage by itself the initial setup of tagging
  protocol, no later than the .setup() method, as well as destroying
  resources used by the last tagger in use, no earlier than the
  .teardown() method.

For multi-switch DSA trees, error handling is a bit more complicated,
since e.g. the 5th out of 7 switches may fail to change the tag
protocol. When that happens, a revert to the original tag protocol is
attempted, but that may fail too, leaving the tree in an inconsistent
state despite each individual switch implementing .change_tag_protocol
transactionally. Since the intersection between drivers that implement
.change_tag_protocol and drivers that support D in DSA is currently the
empty set, the possibility for this error to happen is ignored for now.

Testing:

$ insmod mscc_felix.ko
[   79.549784] mscc_felix 0000:00:00.5: Adding to iommu group 14
[   79.565712] mscc_felix 0000:00:00.5: Failed to register DSA switch: -517
$ insmod tag_ocelot.ko
$ rmmod mscc_felix.ko
$ insmod mscc_felix.ko
[   97.261724] libphy: VSC9959 internal MDIO bus: probed
[   97.267363] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 0
[   97.274998] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 1
[   97.282561] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 2
[   97.289700] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 3
[   97.599163] mscc_felix 0000:00:00.5 swp0 (uninitialized): PHY [0000:00:00.3:10] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
[   97.862034] mscc_felix 0000:00:00.5 swp1 (uninitialized): PHY [0000:00:00.3:11] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
[   97.950731] mscc_felix 0000:00:00.5 swp0: configuring for inband/qsgmii link mode
[   97.964278] 8021q: adding VLAN 0 to HW filter on device swp0
[   98.146161] mscc_felix 0000:00:00.5 swp2 (uninitialized): PHY [0000:00:00.3:12] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
[   98.238649] mscc_felix 0000:00:00.5 swp1: configuring for inband/qsgmii link mode
[   98.251845] 8021q: adding VLAN 0 to HW filter on device swp1
[   98.433916] mscc_felix 0000:00:00.5 swp3 (uninitialized): PHY [0000:00:00.3:13] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
[   98.485542] mscc_felix 0000:00:00.5: configuring for fixed/internal link mode
[   98.503584] mscc_felix 0000:00:00.5: Link is Up - 2.5Gbps/Full - flow control rx/tx
[   98.527948] device eno2 entered promiscuous mode
[   98.544755] DSA: tree 0 setup

$ ping 10.0.0.1
PING 10.0.0.1 (10.0.0.1): 56 data bytes
64 bytes from 10.0.0.1: seq=0 ttl=64 time=2.337 ms
64 bytes from 10.0.0.1: seq=1 ttl=64 time=0.754 ms
^C
 -  10.0.0.1 ping statistics  -
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.754/1.545/2.337 ms

$ cat /sys/class/net/eno2/dsa/tagging
ocelot
$ cat ./test_ocelot_8021q.sh
        #!/bin/bash

        ip link set swp0 down
        ip link set swp1 down
        ip link set swp2 down
        ip link set swp3 down
        ip link set swp5 down
        ip link set eno2 down
        echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging
        ip link set eno2 up
        ip link set swp0 up
        ip link set swp1 up
        ip link set swp2 up
        ip link set swp3 up
        ip link set swp5 up
$ ./test_ocelot_8021q.sh
./test_ocelot_8021q.sh: line 9: echo: write error: Protocol not available
$ rmmod tag_ocelot.ko
rmmod: can't unload module 'tag_ocelot': Resource temporarily unavailable
$ insmod tag_ocelot_8021q.ko
$ ./test_ocelot_8021q.sh
$ cat /sys/class/net/eno2/dsa/tagging
ocelot-8021q
$ rmmod tag_ocelot.ko
$ rmmod tag_ocelot_8021q.ko
rmmod: can't unload module 'tag_ocelot_8021q': Resource temporarily unavailable
$ ping 10.0.0.1
PING 10.0.0.1 (10.0.0.1): 56 data bytes
64 bytes from 10.0.0.1: seq=0 ttl=64 time=0.953 ms
64 bytes from 10.0.0.1: seq=1 ttl=64 time=0.787 ms
64 bytes from 10.0.0.1: seq=2 ttl=64 time=0.771 ms
$ rmmod mscc_felix.ko
[  645.544426] mscc_felix 0000:00:00.5: Link is Down
[  645.838608] DSA: tree 0 torn down
$ rmmod tag_ocelot_8021q.ko

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 21:24:39 -08:00
Vladimir Oltean 357f203bb3 net: dsa: keep a copy of the tagging protocol in the DSA switch tree
Cascading DSA switches can be done multiple ways. There is the brute
force approach / tag stacking, where one upstream switch, located
between leaf switches and the host Ethernet controller, will just
happily transport the DSA header of those leaf switches as payload.
For this kind of setups, DSA works without any special kind of treatment
compared to a single switch - they just aren't aware of each other.
Then there's the approach where the upstream switch understands the tags
it transports from its leaves below, as it doesn't push a tag of its own,
but it routes based on the source port & switch id information present
in that tag (as opposed to DMAC & VID) and it strips the tag when
egressing a front-facing port. Currently only Marvell implements the
latter, and Marvell DSA trees contain only Marvell switches.

So it is safe to say that DSA trees already have a single tag protocol
shared by all switches, and in fact this is what makes the switches able
to understand each other. This fact is also implied by the fact that
currently, the tagging protocol is reported as part of a sysfs installed
on the DSA master and not per port, so it must be the same for all the
ports connected to that DSA master regardless of the switch that they
belong to.

It's time to make this official and enforce it (yes, this also means we
won't have any "switch understands tag to some extent but is not able to
speak it" hardware oddities that we'll support in the future).

This is needed due to the imminent introduction of the dsa_switch_ops::
change_tag_protocol driver API. When that is introduced, we'll have
to notify switches of the tagging protocol that they're configured to
use. Currently the tag_ops structure pointer is held only for CPU ports.
But there are switches which don't have CPU ports and nonetheless still
need to be configured. These would be Marvell leaf switches whose
upstream port is just a DSA link. How do we inform these of their
tagging protocol setup/deletion?

One answer to the above would be: iterate through the DSA switch tree's
ports once, list the CPU ports, get their tag_ops, then iterate again
now that we have it, and notify everybody of that tag_ops. But what to
do if conflicts appear between one cpu_dp->tag_ops and another? There's
no escaping the fact that conflict resolution needs to be done, so we
can be upfront about it.

Ease our work and just keep the master copy of the tag_ops inside the
struct dsa_switch_tree. Reference counting is now moved to be per-tree
too, instead of per-CPU port.

There are many places in the data path that access master->dsa_ptr->tag_ops
and we would introduce unnecessary performance penalty going through yet
another indirection, so keep those right where they are.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 21:24:31 -08:00
Vladimir Oltean 886f8e26f5 net: dsa: document the existing switch tree notifiers and add a new one
The existence of dsa_broadcast has generated some confusion in the past:
https://www.mail-archive.com/netdev@vger.kernel.org/msg365042.html

So let's document the existing dsa_port_notify and dsa_broadcast
functions and explain when each of them should be used.

Also, in fact, the in-between function has always been there but was
lacking a name, and is the main reason for this patch: dsa_tree_notify.
Refactor dsa_broadcast to use it.

This patch also moves dsa_broadcast (a top-level function) to dsa2.c,
where it really belonged in the first place, but had no companion so it
stood with dsa_port_notify.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 21:24:30 -08:00
Vladimir Oltean 9c7caf2806 net: dsa: tag_8021q: add helpers to deduce whether a VLAN ID is RX or TX VLAN
The sja1105 implementation can be blind about this, but the felix driver
doesn't do exactly what it's being told, so it needs to know whether it
is a TX or an RX VLAN, so it can install the appropriate type of TCAM
rule.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 21:24:30 -08:00
Vladimir Oltean 2a6ef76303 net: dsa: add ops for devlink-sb
Switches that care about QoS might have hardware support for reserving
buffer pools for individual ports or traffic classes, and configuring
their sizes and thresholds. Through devlink-sb (shared buffers), this is
all configurable, as well as their occupancy being viewable.

Add the plumbing in DSA for these operations.

Individual drivers still need to call devlink_sb_register() with the
shared buffers they want to expose. A helper was not created in DSA for
this purpose (unlike, say, dsa_devlink_params_register), since in my
opinion it does not bring any benefit over plainly calling
devlink_sb_register() directly.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 20:02:34 -08:00
Vladimir Oltean 0ee2af4ebb net: dsa: set configure_vlan_while_not_filtering to true by default
As explained in commit 54a0ed0df4 ("net: dsa: provide an option for
drivers to always receive bridge VLANs"), DSA has historically been
skipping VLAN switchdev operations when the bridge wasn't in
vlan_filtering mode, but the reason why it was doing that has never been
clear. So the configure_vlan_while_not_filtering option is there merely
to preserve functionality for existing drivers. It isn't some behavior
that drivers should opt into. Ideally, when all drivers leave this flag
set, we can delete the dsa_port_skip_vlan_configuration() function.

New drivers always seem to omit setting this flag, for some reason. So
let's reverse the logic: the DSA core sets it by default to true before
the .setup() callback, and legacy drivers can turn it off. This way, new
drivers get the new behavior by default, unless they explicitly set the
flag to false, which is more obvious during review.

Remove the assignment from drivers which were setting it to true, and
add the assignment to false for the drivers that didn't previously have
it. This way, it should be easier to see how many we have left.

The following drivers: lan9303, mv88e6060 were skipped from setting this
flag to false, because they didn't have any VLAN offload ops in the
first place.

The Broadcom Starfighter 2 driver calls the common b53_switch_alloc and
therefore also inherits the configure_vlan_while_not_filtering=true
behavior.

Also, print a message through netlink extack every time a VLAN has been
skipped. This is mildly annoying on purpose, so that (a) it is at least
clear that VLANs are being skipped - the legacy behavior in itself is
confusing, and the extack should be much more difficult to miss, unlike
kernel logs - and (b) people have one more incentive to convert to the
new behavior.

No behavior change except for the added prints is intended at this time.

$ ip link add br0 type bridge vlan_filtering 0
$ ip link set sw0p2 master br0
[   60.315148] br0: port 1(sw0p2) entered blocking state
[   60.320350] br0: port 1(sw0p2) entered disabled state
[   60.327839] device sw0p2 entered promiscuous mode
[   60.334905] br0: port 1(sw0p2) entered blocking state
[   60.340142] br0: port 1(sw0p2) entered forwarding state
Warning: dsa_core: skipping configuration of VLAN. # This was the pvid
$ bridge vlan add dev sw0p2 vid 100
Warning: dsa_core: skipping configuration of VLAN.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210115231919.43834-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 17:29:40 -08:00
George McCollister 54a52823a2 dsa: add support for Arrow XRS700x tag trailer
Add support for Arrow SpeedChips XRS700x single byte tag trailer. This
is modeled on tag_trailer.c which works in a similar way.

Signed-off-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 15:36:31 -08:00
Jakub Kicinski 1d9f03c0a1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14 18:34:50 -08:00
Tobias Waldekranz 5b60dadb71 net: dsa: tag_dsa: Support reception of packets from LAG devices
Packets ingressing on a LAG that egress on the CPU port, which are not
classified as management, will have a FORWARD tag that does not
contain the normal source device/port tuple. Instead the trunk bit
will be set, and the port field holds the LAG id.

Since the exact source port information is not available in the tag,
frames are injected directly on the LAG interface and thus do never
pass through any DSA port interface on ingress.

Management frames (TO_CPU) are not affected and will pass through the
DSA port interface as usual.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14 17:11:56 -08:00
Tobias Waldekranz 058102a6e9 net: dsa: Link aggregation support
Monitor the following events and notify the driver when:

- A DSA port joins/leaves a LAG.
- A LAG, made up of DSA ports, joins/leaves a bridge.
- A DSA port in a LAG is enabled/disabled (enabled meaning
  "distributing" in 802.3ad LACP terms).

When a LAG joins a bridge, the DSA subsystem will treat that as each
individual port joining the bridge. The driver may look at the port's
LAG device pointer to see if it is associated with any LAG, if that is
required. This is analogue to how switchdev events are replicated out
to all lower devices when reaching e.g. a LAG.

Drivers can optionally request that DSA maintain a linear mapping from
a LAG ID to the corresponding netdev by setting ds->num_lag_ids to the
desired size.

In the event that the hardware is not capable of offloading a
particular LAG for any reason (the typical case being use of exotic
modes like broadcast), DSA will take a hands-off approach, allowing
the LAG to be formed as a pure software construct. This is reported
back through the extended ACK, but is otherwise transparent to the
user.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14 17:11:56 -08:00
Tobias Waldekranz 5696c8aedf net: dsa: Don't offload port attributes on standalone ports
In a situation where a standalone port is indirectly attached to a
bridge (e.g. via a LAG) which is not offloaded, do not offload any
port attributes either. The port should behave as a standard NIC.

Previously, on mv88e6xxx, this meant that in the following setup:

     br0
     /
  team0
   / \
swp0 swp1

If vlan filtering was enabled on br0, swp0's and swp1's QMode was set
to "secure". This caused all untagged packets to be dropped, as their
default VID (0) was not loaded into the VTU.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14 17:11:55 -08:00
Oleksij Rempel c2ec5f2ecf net: dsa: add optional stats64 support
Allow DSA drivers to export stats64

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-12 20:17:09 -08:00
Vladimir Oltean 91158e1680 net: dsa: clear devlink port type before unregistering slave netdevs
Florian reported a use-after-free bug in devlink_nl_port_fill found with
KASAN:

(devlink_nl_port_fill)
(devlink_port_notify)
(devlink_port_unregister)
(dsa_switch_teardown.part.3)
(dsa_tree_teardown_switches)
(dsa_unregister_switch)
(bcm_sf2_sw_remove)
(platform_remove)
(device_release_driver_internal)
(device_links_unbind_consumers)
(device_release_driver_internal)
(device_driver_detach)
(unbind_store)

Allocated by task 31:
 alloc_netdev_mqs+0x5c/0x50c
 dsa_slave_create+0x110/0x9c8
 dsa_register_switch+0xdb0/0x13a4
 b53_switch_register+0x47c/0x6dc
 bcm_sf2_sw_probe+0xaa4/0xc98
 platform_probe+0x90/0xf4
 really_probe+0x184/0x728
 driver_probe_device+0xa4/0x278
 __device_attach_driver+0xe8/0x148
 bus_for_each_drv+0x108/0x158

Freed by task 249:
 free_netdev+0x170/0x194
 dsa_slave_destroy+0xac/0xb0
 dsa_port_teardown.part.2+0xa0/0xb4
 dsa_tree_teardown_switches+0x50/0xc4
 dsa_unregister_switch+0x124/0x250
 bcm_sf2_sw_remove+0x98/0x13c
 platform_remove+0x44/0x5c
 device_release_driver_internal+0x150/0x254
 device_links_unbind_consumers+0xf8/0x12c
 device_release_driver_internal+0x84/0x254
 device_driver_detach+0x30/0x34
 unbind_store+0x90/0x134

What happens is that devlink_port_unregister emits a netlink
DEVLINK_CMD_PORT_DEL message which associates the devlink port that is
getting unregistered with the ifindex of its corresponding net_device.
Only trouble is, the net_device has already been unregistered.

It looks like we can stub out the search for a corresponding net_device
if we clear the devlink_port's type. This looks like a bit of a hack,
but also seems to be the reason why the devlink_port_type_clear function
exists in the first place.

Fixes: 3122433eb5 ("net: dsa: Register devlink ports before calling DSA driver setup()")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian fainelli <f.fainelli@gmail.com>
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210112004831.3778323-1-olteanv@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-12 18:48:50 -08:00
Vladimir Oltean 07b90056cb net: dsa: unbind all switches from tree when DSA master unbinds
Currently the following happens when a DSA master driver unbinds while
there are DSA switches attached to it:

$ echo 0000:00:00.5 > /sys/bus/pci/drivers/mscc_felix/unbind
------------[ cut here ]------------
WARNING: CPU: 0 PID: 392 at net/core/dev.c:9507
Call trace:
 rollback_registered_many+0x5fc/0x688
 unregister_netdevice_queue+0x98/0x120
 dsa_slave_destroy+0x4c/0x88
 dsa_port_teardown.part.16+0x78/0xb0
 dsa_tree_teardown_switches+0x58/0xc0
 dsa_unregister_switch+0x104/0x1b8
 felix_pci_remove+0x24/0x48
 pci_device_remove+0x48/0xf0
 device_release_driver_internal+0x118/0x1e8
 device_driver_detach+0x28/0x38
 unbind_store+0xd0/0x100

Located at the above location is this WARN_ON:

	/* Notifier chain MUST detach us all upper devices. */
	WARN_ON(netdev_has_any_upper_dev(dev));

Other stacked interfaces, like VLAN, do indeed listen for
NETDEV_UNREGISTER on the real_dev and also unregister themselves at that
time, which is clearly the behavior that rollback_registered_many
expects. But DSA interfaces are not VLAN. They have backing hardware
(platform devices, PCI devices, MDIO, SPI etc) which have a life cycle
of their own and we can't just trigger an unregister from the DSA
framework when we receive a netdev notifier that the master unregisters.

Luckily, there is something we can do, and that is to inform the driver
core that we have a runtime dependency to the DSA master interface's
device, and create a device link where that is the supplier and we are
the consumer. Having this device link will make the DSA switch unbind
before the DSA master unbinds, which is enough to avoid the WARN_ON from
rollback_registered_many.

Note that even before the blamed commit, DSA did nothing intelligent
when the master interface got unregistered either. See the discussion
here:
https://lore.kernel.org/netdev/20200505210253.20311-1-f.fainelli@gmail.com/
But this time, at least the WARN_ON is loud enough that the
upper_dev_link commit can be blamed.

The advantage with this approach vs dev_hold(master) in the attached
link is that the latter is not meant for long term reference counting.
With dev_hold, the only thing that will happen is that when the user
attempts an unbind of the DSA master, netdev_wait_allrefs will keep
waiting and waiting, due to DSA keeping the refcount forever. DSA would
not access freed memory corresponding to the master interface, but the
unbind would still result in a freeze. Whereas with device links,
graceful teardown is ensured. It even works with cascaded DSA trees.

$ echo 0000:00:00.2 > /sys/bus/pci/drivers/fsl_enetc/unbind
[ 1818.797546] device swp0 left promiscuous mode
[ 1819.301112] sja1105 spi2.0: Link is Down
[ 1819.307981] DSA: tree 1 torn down
[ 1819.312408] device eno2 left promiscuous mode
[ 1819.656803] mscc_felix 0000:00:00.5: Link is Down
[ 1819.667194] DSA: tree 0 torn down
[ 1819.711557] fsl_enetc 0000:00:00.2 eno2: Link is Down

This approach allows us to keep the DSA framework absolutely unchanged,
and the driver core will just know to unbind us first when the master
goes away - as opposed to the large (and probably impossible) rework
required if attempting to listen for NETDEV_UNREGISTER.

As per the documentation at Documentation/driver-api/device_link.rst,
specifying the DL_FLAG_AUTOREMOVE_CONSUMER flag causes the device link
to be automatically purged when the consumer fails to probe or later
unbinds. So we don't need to keep the consumer_link variable in struct
dsa_switch.

Fixes: 2f1e8ea726 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210111230943.3701806-1-olteanv@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-12 18:48:40 -08:00
Vladimir Oltean 417b99bf75 net: dsa: remove obsolete comments about switchdev transactions
Now that all port object notifiers were converted to be non-transactional,
we can remove the comments that say otherwise.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-11 16:00:57 -08:00
Vladimir Oltean 1958d5815c net: dsa: remove the transactional logic from VLAN objects
It should be the driver's business to logically separate its VLAN
offloading into a preparation and a commit phase, and some drivers don't
need / can't do this.

So remove the transactional shim from DSA and let drivers propagate
errors directly from the .port_vlan_add callback.

It would appear that the code has worse error handling now than it had
before. DSA is the only in-kernel user of switchdev that offloads one
switchdev object to more than one port: for every VLAN object offloaded
to a user port, that VLAN is also offloaded to the CPU port. So the
"prepare for user port -> check for errors -> prepare for CPU port ->
check for errors -> commit for user port -> commit for CPU port"
sequence appears to make more sense than the one we are using now:
"offload to user port -> check for errors -> offload to CPU port ->
check for errors", but it is really a compromise. In the new way, we can
catch errors from the commit phase that we previously had to ignore.
But we have our hands tied and cannot do any rollback now: if we add a
VLAN on the CPU port and it fails, we can't do the rollback by simply
deleting it from the user port, because the switchdev API is not so nice
with us: it could have simply been there already, even with the same
flags. So we don't even attempt to rollback anything on addition error,
just leave whatever VLANs managed to get offloaded right where they are.
This should not be a problem at all in practice.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-11 16:00:57 -08:00
Vladimir Oltean a52b2da778 net: dsa: remove the transactional logic from MDB entries
For many drivers, the .port_mdb_prepare callback was not a good opportunity
to avoid any error condition, and they would suppress errors found during
the actual commit phase.

Where a logical separation between the prepare and the commit phase
existed, the function that used to implement the .port_mdb_prepare
callback still exists, but now it is called directly from .port_mdb_add,
which was modified to return an int code.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
Reviewed-by: Linus Wallei <linus.walleij@linaro.org> # RTL8366
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-11 16:00:57 -08:00
Vladimir Oltean 77b61365ec net: dsa: remove the transactional logic from ageing time notifiers
Remove the shim introduced in DSA for offloading the bridge ageing time
from switchdev, by first checking whether the ageing time is within the
range limits requested by the driver.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-11 16:00:57 -08:00
Vladimir Oltean bae33f2b5a net: switchdev: remove the transaction structure from port attributes
Since the introduction of the switchdev API, port attributes were
transmitted to drivers for offloading using a two-step transactional
model, with a prepare phase that was supposed to catch all errors, and a
commit phase that was supposed to never fail.

Some classes of failures can never be avoided, like hardware access, or
memory allocation. In the latter case, merely attempting to move the
memory allocation to the preparation phase makes it impossible to avoid
memory leaks, since commit 91cf8eceff ("switchdev: Remove unused
transaction item queue") which has removed the unused mechanism of
passing on the allocated memory between one phase and another.

It is time we admit that separating the preparation from the commit
phase is something that is best left for the driver to decide, and not
something that should be baked into the API, especially since there are
no switchdev callers that depend on this.

This patch removes the struct switchdev_trans member from switchdev port
attribute notifier structures, and converts drivers to not look at this
member.

In part, this patch contains a revert of my previous commit 2e554a7a5d
("net: dsa: propagate switchdev vlan_filtering prepare phase to
drivers").

For the most part, the conversion was trivial except for:
- Rocker's world implementation based on Broadcom OF-DPA had an odd
  implementation of ofdpa_port_attr_bridge_flags_set. The conversion was
  done mechanically, by pasting the implementation twice, then only
  keeping the code that would get executed during prepare phase on top,
  then only keeping the code that gets executed during the commit phase
  on bottom, then simplifying the resulting code until this was obtained.
- DSA's offloading of STP state, bridge flags, VLAN filtering and
  multicast router could be converted right away. But the ageing time
  could not, so a shim was introduced and this was left for a further
  commit.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
Reviewed-by: Linus Walleij <linus.walleij@linaro.org> # RTL8366RB
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-11 16:00:57 -08:00
Vladimir Oltean ffb68fc58e net: switchdev: remove the transaction structure from port object notifiers
Since the introduction of the switchdev API, port objects were
transmitted to drivers for offloading using a two-step transactional
model, with a prepare phase that was supposed to catch all errors, and a
commit phase that was supposed to never fail.

Some classes of failures can never be avoided, like hardware access, or
memory allocation. In the latter case, merely attempting to move the
memory allocation to the preparation phase makes it impossible to avoid
memory leaks, since commit 91cf8eceff ("switchdev: Remove unused
transaction item queue") which has removed the unused mechanism of
passing on the allocated memory between one phase and another.

It is time we admit that separating the preparation from the commit
phase is something that is best left for the driver to decide, and not
something that should be baked into the API, especially since there are
no switchdev callers that depend on this.

This patch removes the struct switchdev_trans member from switchdev port
object notifier structures, and converts drivers to not look at this
member.

Where driver conversion is trivial (like in the case of the Marvell
Prestera driver, NXP DPAA2 switch, TI CPSW, and Rocker drivers), it is
done in this patch.

Where driver conversion needs more attention (DSA, Mellanox Spectrum),
the conversion is left for subsequent patches and here we only fake the
prepare/commit phases at a lower level, just not in the switchdev
notifier itself.

Where the code has a natural structure that is best left alone as a
preparation and a commit phase (as in the case of the Ocelot switch),
that structure is left in place, just made to not depend upon the
switchdev transactional model.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-11 16:00:56 -08:00
Vladimir Oltean b7a9e0da2d net: switchdev: remove vid_begin -> vid_end range from VLAN objects
The call path of a switchdev VLAN addition to the bridge looks something
like this today:

        nbp_vlan_init
        |  __br_vlan_set_default_pvid
        |  |                       |
        |  |    br_afspec          |
        |  |        |              |
        |  |        v              |
        |  | br_process_vlan_info  |
        |  |        |              |
        |  |        v              |
        |  |   br_vlan_info        |
        |  |       / \            /
        |  |      /   \          /
        |  |     /     \        /
        |  |    /       \      /
        v  v   v         v    v
      nbp_vlan_add   br_vlan_add ------+
       |              ^      ^ |       |
       |             /       | |       |
       |            /       /  /       |
       \ br_vlan_get_master/  /        v
        \        ^        /  /  br_vlan_add_existing
         \       |       /  /          |
          \      |      /  /          /
           \     |     /  /          /
            \    |    /  /          /
             \   |   /  /          /
              v  |   | v          /
              __vlan_add         /
                 / |            /
                /  |           /
               v   |          /
   __vlan_vid_add  |         /
               \   |        /
                v  v        v
      br_switchdev_port_vlan_add

The ranges UAPI was introduced to the bridge in commit bdced7ef78
("bridge: support for multiple vlans and vlan ranges in setlink and
dellink requests") (Jan 10 2015). But the VLAN ranges (parsed in br_afspec)
have always been passed one by one, through struct bridge_vlan_info
tmp_vinfo, to br_vlan_info. So the range never went too far in depth.

Then Scott Feldman introduced the switchdev_port_bridge_setlink function
in commit 47f8328bb1 ("switchdev: add new switchdev bridge setlink").
That marked the introduction of the SWITCHDEV_OBJ_PORT_VLAN, which made
full use of the range. But switchdev_port_bridge_setlink was called like
this:

br_setlink
-> br_afspec
-> switchdev_port_bridge_setlink

Basically, the switchdev and the bridge code were not tightly integrated.
Then commit 41c498b935 ("bridge: restore br_setlink back to original")
came, and switchdev drivers were required to implement
.ndo_bridge_setlink = switchdev_port_bridge_setlink for a while.

In the meantime, commits such as 0944d6b5a2 ("bridge: try switchdev op
first in __vlan_vid_add/del") finally made switchdev penetrate the
br_vlan_info() barrier and start to develop the call path we have today.
But remember, br_vlan_info() still receives VLANs one by one.

Then Arkadi Sharshevsky refactored the switchdev API in 2017 in commit
29ab586c3d ("net: switchdev: Remove bridge bypass support from
switchdev") so that drivers would not implement .ndo_bridge_setlink any
longer. The switchdev_port_bridge_setlink also got deleted.
This refactoring removed the parallel bridge_setlink implementation from
switchdev, and left the only switchdev VLAN objects to be the ones
offloaded from __vlan_vid_add (basically RX filtering) and  __vlan_add
(the latter coming from commit 9c86ce2c1a ("net: bridge: Notify about
bridge VLANs")).

That is to say, today the switchdev VLAN object ranges are not used in
the kernel. Refactoring the above call path is a bit complicated, when
the bridge VLAN call path is already a bit complicated.

Let's go off and finish the job of commit 29ab586c3d by deleting the
bogus iteration through the VLAN ranges from the drivers. Some aspects
of this feature never made too much sense in the first place. For
example, what is a range of VLANs all having the BRIDGE_VLAN_INFO_PVID
flag supposed to mean, when a port can obviously have a single pvid?
This particular configuration _is_ denied as of commit 6623c60dc2
("bridge: vlan: enforce no pvid flag in vlan ranges"), but from an API
perspective, the driver still has to play pretend, and only offload the
vlan->vid_end as pvid. And the addition of a switchdev VLAN object can
modify the flags of another, completely unrelated, switchdev VLAN
object! (a VLAN that is PVID will invalidate the PVID flag from whatever
other VLAN had previously been offloaded with switchdev and had that
flag. Yet switchdev never notifies about that change, drivers are
supposed to guess).

Nonetheless, having a VLAN range in the API makes error handling look
scarier than it really is - unwinding on errors and all of that.
When in reality, no one really calls this API with more than one VLAN.
It is all unnecessary complexity.

And despite appearing pretentious (two-phase transactional model and
all), the switchdev API is really sloppy because the VLAN addition and
removal operations are not paired with one another (you can add a VLAN
100 times and delete it just once). The bridge notifies through
switchdev of a VLAN addition not only when the flags of an existing VLAN
change, but also when nothing changes. There are switchdev drivers out
there who don't like adding a VLAN that has already been added, and
those checks don't really belong at driver level. But the fact that the
API contains ranges is yet another factor that prevents this from being
addressed in the future.

Of the existing switchdev pieces of hardware, it appears that only
Mellanox Spectrum supports offloading more than one VLAN at a time,
through mlxsw_sp_port_vlan_set. I have kept that code internal to the
driver, because there is some more bookkeeping that makes use of it, but
I deleted it from the switchdev API. But since the switchdev support for
ranges has already been de facto deleted by a Mellanox employee and
nobody noticed for 4 years, I'm going to assume it's not a biggie.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com> # switchdev and mlxsw
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-11 16:00:56 -08:00
Vladimir Oltean 4b9c935898 net: dsa: dsa_legacy_fdb_{add,del} can be static
Introduced in commit 37b8da1a3c ("net: dsa: Move FDB add/del
implementation inside DSA") in net/dsa/legacy.c, these functions were
moved again to slave.c as part of commit 2a93c1a365 ("net: dsa: Allow
compiling out legacy support"), before actually deleting net/dsa/slave.c
in 93e86b3bc8 ("net: dsa: Remove legacy probing support"). Along with
that movement there should have been a deletion of the prototypes from
dsa_priv.h, they are not useful.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210108233054.1222278-1-olteanv@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-09 18:05:52 -08:00
Vladimir Oltean 1dbb130281 net: dsa: remove the DSA specific notifiers
This effectively reverts commit 60724d4bae ("net: dsa: Add support for
DSA specific notifiers"). The reason is that since commit 2f1e8ea726
("net: dsa: link interfaces with the DSA master to get rid of lockdep
warnings"), it appears that there is a generic way to achieve the same
purpose. The only user thus far, the Broadcom SYSTEMPORT driver, was
converted to use the generic notifiers.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 15:42:07 -08:00
Vladimir Oltean a5e3c9ba92 net: dsa: export dsa_slave_dev_check
Using the NETDEV_CHANGEUPPER notifications, drivers can be aware when
they are enslaved to e.g. a bridge by calling netif_is_bridge_master().

Export this helper from DSA to get the equivalent functionality of
determining whether the upper interface of a CHANGEUPPER notifier is a
DSA switch interface or not.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 15:42:07 -08:00
Vladimir Oltean f46b9b8ee8 net: dsa: move the Broadcom tag information in a separate header file
It is a bit strange to see something as specific as Broadcom SYSTEMPORT
bits in the main DSA include file. Move these away into a separate
header, and have the tagger and the SYSTEMPORT driver include them.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 15:42:07 -08:00
Vladimir Oltean d5f19486ce net: dsa: listen for SWITCHDEV_{FDB,DEL}_ADD_TO_DEVICE on foreign bridge neighbors
Some DSA switches (and not only) cannot learn source MAC addresses from
packets injected from the CPU. They only perform hardware address
learning from inbound traffic.

This can be problematic when we have a bridge spanning some DSA switch
ports and some non-DSA ports (which we'll call "foreign interfaces" from
DSA's perspective).

There are 2 classes of problems created by the lack of learning on
CPU-injected traffic:
- excessive flooding, due to the fact that DSA treats those addresses as
  unknown
- the risk of stale routes, which can lead to temporary packet loss

To illustrate the second class, consider the following situation, which
is common in production equipment (wireless access points, where there
is a WLAN interface and an Ethernet switch, and these form a single
bridging domain).

 AP 1:
 +------------------------------------------------------------------------+
 |                                          br0                           |
 +------------------------------------------------------------------------+
 +------------+ +------------+ +------------+ +------------+ +------------+
 |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
 +------------+ +------------+ +------------+ +------------+ +------------+
       |                                                       ^        ^
       |                                                       |        |
       |                                                       |        |
       |                                                    Client A  Client B
       |
       |
       |
 +------------+ +------------+ +------------+ +------------+ +------------+
 |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
 +------------+ +------------+ +------------+ +------------+ +------------+
 +------------------------------------------------------------------------+
 |                                          br0                           |
 +------------------------------------------------------------------------+
 AP 2

- br0 of AP 1 will know that Clients A and B are reachable via wlan0
- the hardware fdb of a DSA switch driver today is not kept in sync with
  the software entries on other bridge ports, so it will not know that
  clients A and B are reachable via the CPU port UNLESS the hardware
  switch itself performs SA learning from traffic injected from the CPU.
  Nonetheless, a substantial number of switches don't.
- the hardware fdb of the DSA switch on AP 2 may autonomously learn that
  Client A and B are reachable through swp0. Therefore, the software br0
  of AP 2 also may or may not learn this. In the example we're
  illustrating, some Ethernet traffic has been going on, and br0 from AP
  2 has indeed learnt that it can reach Client B through swp0.

One of the wireless clients, say Client B, disconnects from AP 1 and
roams to AP 2. The topology now looks like this:

 AP 1:
 +------------------------------------------------------------------------+
 |                                          br0                           |
 +------------------------------------------------------------------------+
 +------------+ +------------+ +------------+ +------------+ +------------+
 |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
 +------------+ +------------+ +------------+ +------------+ +------------+
       |                                                            ^
       |                                                            |
       |                                                         Client A
       |
       |
       |                                                         Client B
       |                                                            |
       |                                                            v
 +------------+ +------------+ +------------+ +------------+ +------------+
 |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
 +------------+ +------------+ +------------+ +------------+ +------------+
 +------------------------------------------------------------------------+
 |                                          br0                           |
 +------------------------------------------------------------------------+
 AP 2

- br0 of AP 1 still knows that Client A is reachable via wlan0 (no change)
- br0 of AP 1 will (possibly) know that Client B has left wlan0. There
  are cases where it might never find out though. Either way, DSA today
  does not process that notification in any way.
- the hardware FDB of the DSA switch on AP 1 may learn autonomously that
  Client B can be reached via swp0, if it receives any packet with
  Client 1's source MAC address over Ethernet.
- the hardware FDB of the DSA switch on AP 2 still thinks that Client B
  can be reached via swp0. It does not know that it has roamed to wlan0,
  because it doesn't perform SA learning from the CPU port.

Now Client A contacts Client B.
AP 1 routes the packet fine towards swp0 and delivers it on the Ethernet
segment.
AP 2 sees a frame on swp0 and its fdb says that the destination is swp0.
Hairpinning is disabled => drop.

This problem comes from the fact that these switches have a 'blind spot'
for addresses coming from software bridging. The generic solution is not
to assume that hardware learning can be enabled somehow, but to listen
to more bridge learning events. It turns out that the bridge driver does
learn in software from all inbound frames, in __br_handle_local_finish.
A proper SWITCHDEV_FDB_ADD_TO_DEVICE notification is emitted for the
addresses serviced by the bridge on 'foreign' interfaces. The software
bridge also does the right thing on migration, by notifying that the old
entry is deleted, so that does not need to be special-cased in DSA. When
it is deleted, we just need to delete our static FDB entry towards the
CPU too, and wait.

The problem is that DSA currently only cares about SWITCHDEV_FDB_ADD_TO_DEVICE
events received on its own interfaces, such as static FDB entries.

Luckily we can change that, and DSA can listen to all switchdev FDB
add/del events in the system and figure out if those events were emitted
by a bridge that spans at least one of DSA's own ports. In case that is
true, DSA will also offload that address towards its own CPU port, in
the eventuality that there might be bridge clients attached to the DSA
switch who want to talk to the station connected to the foreign
interface.

In terms of implementation, we need to keep the fdb_info->added_by_user
check for the case where the switchdev event was targeted directly at a
DSA switch port. But we don't need to look at that flag for snooped
events. So the check is currently too late, we need to move it earlier.
This also simplifies the code a bit, since we avoid uselessly allocating
and freeing switchdev_work.

We could probably do some improvements in the future. For example,
multi-bridge support is rudimentary at the moment. If there are two
bridges spanning a DSA switch's ports, and both of them need to service
the same MAC address, then what will happen is that the migration of one
of those stations will trigger the deletion of the FDB entry from the
CPU port while it is still used by other bridge. That could be improved
with reference counting but is left for another time.

This behavior needs to be enabled at driver level by setting
ds->assisted_learning_on_cpu_port = true. This is because we don't want
to inflict a potential performance penalty (accesses through
MDIO/I2C/SPI are expensive) to hardware that really doesn't need it
because address learning on the CPU port works there.

Reported-by: DENG Qingfang <dqfext@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 15:34:46 -08:00
Vladimir Oltean 5fb4a451a8 net: dsa: exit early in dsa_slave_switchdev_event if we can't program the FDB
Right now, the following would happen for a switch driver that does not
implement .port_fdb_add or .port_fdb_del.

dsa_slave_switchdev_event returns NOTIFY_OK and schedules:
-> dsa_slave_switchdev_event_work
   -> dsa_port_fdb_add
      -> dsa_port_notify(DSA_NOTIFIER_FDB_ADD)
         -> dsa_switch_fdb_add
            -> if (!ds->ops->port_fdb_add) return -EOPNOTSUPP;
   -> an error is printed with dev_dbg, and
      dsa_fdb_offload_notify(switchdev_work) is not called.

We can avoid scheduling the worker for nothing and say NOTIFY_DONE.
Because we don't call dsa_fdb_offload_notify, the static FDB entry will
remain just in the software bridge.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 15:34:46 -08:00
Vladimir Oltean 447d290a58 net: dsa: move switchdev event implementation under the same switch/case statement
We'll need to start listening to SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE
events even for interfaces where dsa_slave_dev_check returns false, so
we need that check inside the switch-case statement for SWITCHDEV_FDB_*.

This movement also avoids a useless allocation / free of switchdev_work
on the untreated "default event" case.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 15:34:46 -08:00
Vladimir Oltean c4bb76a9a0 net: dsa: don't use switchdev_notifier_fdb_info in dsa_switchdev_event_work
Currently DSA doesn't add FDB entries on the CPU port, because it only
does so through switchdev, which is associated with a net_device, and
there are none of those for the CPU port.

But actually FDB addresses on the CPU port have some use cases of their
own, if the switchdev operations are initiated from within the DSA
layer. There is just one problem with the existing code: it passes a
structure in dsa_switchdev_event_work which was retrieved directly from
switchdev, so it contains a net_device. We need to generalize the
contents to something that covers the CPU port as well: the "ds, port"
tuple is fine for that.

Note that the new procedure for notifying the successful FDB offload is
inspired from the rocker model.

Also, nothing was being done if added_by_user was false. Let's check for
that a lot earlier, and don't actually bother to schedule the worker
for nothing.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 15:34:45 -08:00
Vladimir Oltean 2fd186501b net: dsa: be louder when a non-legacy FDB operation fails
The dev_close() call was added in commit c9eb3e0f87 ("net: dsa: Add
support for learning FDB through notification") "to indicate inconsistent
situation" when we could not delete an FDB entry from the port.

bridge fdb del d8:58:d7:00:ca:6d dev swp0 self master

It is a bit drastic and at the same time not helpful if the above fails
to only print with netdev_dbg log level, but on the other hand to bring
the interface down.

So increase the verbosity of the error message, and drop dev_close().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 15:34:45 -08:00
Rafał Miłecki 8209f5bc3b net: dsa: print error on invalid port index
Looking for an -EINVAL all over the dsa code could take hours for
inexperienced DSA users.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210106090915.21439-1-zajec5@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-06 16:21:08 -08:00
Rasmus Villemoes bdc40a3f4b net: dsa: print the MTU value that could not be set
These warnings become somewhat more informative when they include the
MTU value that could not be set and not just the errno.

Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Link: https://lore.kernel.org/r/20201205133944.10182-1-rasmus.villemoes@prevas.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-08 11:24:07 -08:00
Kurt Kanzenbach 8551fad63c net: dsa: tag_hellcreek: Cleanup includes
Remove unused and add needed includes. No functional change.

Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-23 16:57:21 -08:00
Christian Eggers 30abc9cd9c net: dsa: avoid potential use-after-free error
If dsa_switch_ops::port_txtstamp() returns false, clone will be freed
immediately. Shouldn't store a pointer to freed memory.

Signed-off-by: Christian Eggers <ceggers@arri.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20201119110906.25558-1-ceggers@arri.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 15:02:50 -08:00
Tobias Waldekranz 13f49b6f26 net: dsa: tag_dsa: Use a consistent comment style
Use a consistent style of one-line/multi-line comments throughout the
file.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 09:16:12 -08:00
Tobias Waldekranz 469ee5fe73 net: dsa: tag_dsa: Unify regular and ethertype DSA taggers
Ethertype DSA encodes exactly the same information in the DSA tag as
the non-ethertype variety. So refactor out the common parts and reuse
them for both protocols.

This is ensures tag parsing and generation is always consistent across
all mv88e6xxx chips.

While we are at it, explicitly deal with all possible CPU codes on
receive, making sure to set offload_fwd_mark as appropriate.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 09:16:12 -08:00
Tobias Waldekranz e468d141b9 net: dsa: tag_dsa: Allow forwarding of redirected IGMP traffic
When receiving an IGMP/MLD frame with a TO_CPU tag, the switch has not
performed any forwarding of it. This means that we should not set the
offload_fwd_mark on the skb, in case a software bridge wants it
forwarded.

This is a port of:

1ed9ec9b08 ("dsa: Allow forwarding of redirected IGMP traffic")

Which corrected the issue for chips using EDSA tags, but not for those
using regular DSA tags.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 09:16:11 -08:00
Heiner Kallweit 6a90062879 net: dsa: use net core stats64 handling
Use netdev->tstats instead of a member of dsa_slave_priv for storing
a pointer to the per-cpu counters. This allows us to use core
functionality for statistics handling.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-09 17:50:27 -08:00
Vladimir Oltean e358bef7c3 net: dsa: Give drivers the chance to veto certain upper devices
Some switches rely on unique pvids to ensure port separation in
standalone mode, because they don't have a port forwarding matrix
configurable in hardware. So, setups like a group of 2 uppers with the
same VLAN, swp0.100 and swp1.100, will cause traffic tagged with VLAN
100 to be autonomously forwarded between these switch ports, in spite
of there being no bridge between swp0 and swp1.

These drivers need to prevent this from happening. They need to have
VLAN filtering enabled in standalone mode (so they'll drop frames tagged
with unknown VLANs) and they can only accept an 8021q upper on a port as
long as it isn't installed on any other port too. So give them the
chance to veto bad user requests.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
[Kurt: Pass info instead of ptr]
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-05 14:04:49 -08:00
Kurt Kanzenbach 01ef09caad net: dsa: Add tag handling for Hirschmann Hellcreek switches
The Hirschmann Hellcreek TSN switches have a special tagging protocol for frames
exchanged between the CPU port and the master interface. The format is a one
byte trailer indicating the destination or origin port.

It's quite similar to the Micrel KSZ tagging. That's why the implementation is
based on that code.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-05 14:04:49 -08:00
Vladimir Oltean 86c4ad9a78 net: dsa: tag_ar9331: let DSA core deal with TX reallocation
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Cc: Per Forlin <per.forlin@axis.com>
Cc: Oleksij Rempel <linux@rempel-privat.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Oleksij Rempel <linux@rempel-privat.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:17 -08:00
Vladimir Oltean 9b9826ae11 net: dsa: tag_gswip: let DSA core deal with TX reallocation
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

This one is interesting, the DSA tag is 8 bytes on RX and 4 bytes on TX.
Because DSA is unaware of asymmetrical tag lengths, the overhead/needed
headroom is declared as 8 bytes and therefore 4 bytes larger than it
needs to be. If this becomes a problem, and the GSWIP driver can't be
converted to a uniform header length, we might need to make DSA aware of
separate RX/TX overhead values.

Cc: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:16 -08:00
Vladimir Oltean 952a063450 net: dsa: tag_dsa: let DSA core deal with TX reallocation
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Similar to the EtherType DSA tagger, the old Marvell tagger can
transform an 802.1Q header if present into a DSA tag, so there is no
headroom required in that case. But we are ensuring that it exists,
regardless (practically speaking, the headroom must be 4 bytes larger
than it needs to be).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:16 -08:00
Vladimir Oltean 2f0d030c5f net: dsa: tag_brcm: let DSA core deal with TX reallocation
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:16 -08:00
Vladimir Oltean c6c4e1237d net: dsa: tag_edsa: let DSA core deal with TX reallocation
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Note that the VLAN code path needs a smaller extra headroom than the
regular EtherType DSA path. That isn't a problem, because this tagger
declares the larger tag length (8 bytes vs 4) as the protocol overhead,
so we are covered in both cases.

Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:16 -08:00
Vladimir Oltean 6ed94135f5 net: dsa: tag_lan9303: let DSA core deal with TX reallocation
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:16 -08:00
Vladimir Oltean 941f66beb7 net: dsa: tag_mtk: let DSA core deal with TX reallocation
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: John Crispin <john@phrozen.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:16 -08:00
Vladimir Oltean 9c5c3bd005 net: dsa: tag_ocelot: let DSA core deal with TX reallocation
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:16 -08:00
Vladimir Oltean 9bbda29ae1 net: dsa: tag_qca: let DSA core deal with TX reallocation
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Cc: John Crispin <john@phrozen.org>
Cc: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:16 -08:00
Christian Eggers ef3f72fee2 net: dsa: trailer: don't allocate additional memory for padding/tagging
The caller (dsa_slave_xmit) guarantees that the frame length is at least
ETH_ZLEN and that enough memory for tail tagging is available.

Signed-off-by: Christian Eggers <ceggers@arri.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:16 -08:00
Christian Eggers 88fda8eefd net: dsa: tag_ksz: don't allocate additional memory for padding/tagging
The caller (dsa_slave_xmit) guarantees that the frame length is at least
ETH_ZLEN and that enough memory for tail tagging is available.

Signed-off-by: Christian Eggers <ceggers@arri.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:16 -08:00
Vladimir Oltean a3b0b64797 net: dsa: implement a central TX reallocation procedure
At the moment, taggers are left with the task of ensuring that the skb
headers are writable (which they aren't, if the frames were cloned for
TX timestamping, for flooding by the bridge, etc), and that there is
enough space in the skb data area for the DSA tag to be pushed.

Moreover, the life of tail taggers is even harder, because they need to
ensure that short frames have enough padding, a problem that normal
taggers don't have.

The principle of the DSA framework is that everything except for the
most intimate hardware specifics (like in this case, the actual packing
of the DSA tag bits) should be done inside the core, to avoid having
code paths that are very rarely tested.

So provide a TX reallocation procedure that should cover the known needs
of DSA today.

Note that this patch also gives the network stack a good hint about the
headroom/tailroom it's going to need. Up till now it wasn't doing that.
So the reallocation procedure should really be there only for the
exceptional cases, and for cloned packets which need to be unshared.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Christian Eggers <ceggers@arri.de> # For tail taggers only
Tested-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:15 -08:00
Christian Eggers bc7e343dbd net: dsa: tag_ksz: KSZ8795 and KSZ9477 also use tail tags
The Marvell 88E6060 uses tag_trailer.c and the KSZ8795, KSZ9477 and
KSZ9893 switches also use tail tags.

Fixes: 7a6ffe764b ("net: dsa: point out the tail taggers")
Signed-off-by: Christian Eggers <ceggers@arri.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20201016171603.10587-1-ceggers@arri.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-19 17:32:50 -07:00
Heiner Kallweit a0d2698101 net: dsa: use new function dev_fetch_sw_netstats
Simplify the code by using new function dev_fetch_sw_netstats().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/b6047017-8226-6b7e-a3cd-064e69fdfa27@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-13 17:33:49 -07:00
Vladimir Oltean ea440cd2d9 net: dsa: tag_ocelot: use VLAN information from tagging header when available
When the Extraction Frame Header contains a valid classified VLAN, use
that instead of the VLAN header present in the packet.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11 11:19:04 -07:00
Vladimir Oltean 2e554a7a5d net: dsa: propagate switchdev vlan_filtering prepare phase to drivers
A driver may refuse to enable VLAN filtering for any reason beyond what
the DSA framework cares about, such as:
- having tc-flower rules that rely on the switch being VLAN-aware
- the particular switch does not support VLAN, even if the driver does
  (the DSA framework just checks for the presence of the .port_vlan_add
  and .port_vlan_del pointers)
- simply not supporting this configuration to be toggled at runtime

Currently, when a driver rejects a configuration it cannot support, it
does this from the commit phase, which triggers various warnings in
switchdev.

So propagate the prepare phase to drivers, to give them the ability to
refuse invalid configurations cleanly and avoid the warnings.

Since we need to modify all function prototypes and check for the
prepare phase from within the drivers, take that opportunity and move
the existing driver restrictions within the prepare phase where that is
possible and easy.

Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Woojung Huh <woojung.huh@microchip.com>
Cc: Microchip Linux Driver Support <UNGLinuxDriver@microchip.com>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: Landen Chao <Landen.Chao@mediatek.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Jonathan McDowell <noodles@earth.li>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-05 05:56:48 -07:00
Andrew Lunn 08156ba430 net: dsa: Add devlink port regions support to DSA
Allow DSA drivers to make use of devlink port regions, via simple
wrappers.

Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-04 14:38:53 -07:00
Andrew Lunn 3122433eb5 net: dsa: Register devlink ports before calling DSA driver setup()
DSA drivers want to create regions on devlink ports as well as the
devlink device instance, in order to export registers and other tables
per port. To keep all this code together in the drivers, have the
devlink ports registered early, so the setup() method can setup both
device and port devlink regions.

v3:
Remove dp->setup
Move common code out of switch statement.
Fix wrong goto

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-04 14:38:53 -07:00
Andrew Lunn f15ec13a96 net: dsa: Make use of devlink port flavour unused
If a port is unused, still create a devlink port for it, but set the
flavour to unused. This allows us to attach devlink regions to the
port, etc.

Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-04 14:38:52 -07:00
Florian Fainelli 3a68844dd2 net: dsa: Utilize __vlan_find_dev_deep_rcu()
Now that we are guaranteed that dsa_untag_bridge_pvid() is called after
eth_type_trans() we can utilize __vlan_find_dev_deep_rcu() which will
take care of finding an 802.1Q upper on top of a bridge master.

A common use case, prior to 12a1526d067 ("net: dsa: untag the bridge
pvid from rx skbs") was to configure a bridge 802.1Q upper like this:

ip link add name br0 type bridge vlan_filtering 0
ip link add link br0 name br0.1 type vlan id 1

in order to pop the default_pvid VLAN tag.

With this change we restore that behavior while still allowing the DSA
receive path to automatically pop the VLAN tag.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02 13:36:07 -07:00
Florian Fainelli a348292b63 net: dsa: Obtain VLAN protocol from skb->protocol
Now that dsa_untag_bridge_pvid() is called after eth_type_trans() we are
guaranteed that skb->protocol will be set to a correct value, thus
allowing us to avoid calling vlan_eth_hdr().

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02 13:36:07 -07:00
Florian Fainelli 1c5ad5a940 net: dsa: b53: Set untag_bridge_pvid
Indicate to the DSA receive path that we need to untage the bridge PVID,
this allows us to remove the dsa_untag_bridge_pvid() calls from
net/dsa/tag_brcm.c.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02 13:36:07 -07:00
Florian Fainelli 1dc0408cdf net: dsa: Call dsa_untag_bridge_pvid() from dsa_switch_rcv()
When a DSA switch driver needs to call dsa_untag_bridge_pvid(), it can
set dsa_switch::untag_brige_pvid to indicate this is necessary.

This is a pre-requisite to making sure that we are always calling
dsa_untag_bridge_pvid() after eth_type_trans() has been called.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02 13:36:07 -07:00
Vladimir Oltean 300fd579b2 net: dsa: tag_rtl4_a: use the generic flow dissector procedure
Remove the .flow_dissect procedure, so the flow dissector will call the
generic variant which works for this tagging protocol.

Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Mauri Sandberg <sandberg@mailfence.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-26 14:17:59 -07:00
Vladimir Oltean e665297983 net: dsa: tag_sja1105: use a custom flow dissector procedure
The sja1105 is a bit of a special snowflake, in that not all frames are
transmitted/received in the same way. L2 link-local frames are received
with the source port/switch ID information put in the destination MAC
address. For the rest, a tag_8021q header is used. So only the latter
frames displace the rest of the headers and need to use the generic flow
dissector procedure.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-26 14:17:59 -07:00
Vladimir Oltean 6b04f171dc net: dsa: tag_qca: use the generic flow dissector procedure
Remove the .flow_dissect procedure, so the flow dissector will call the
generic variant which works for this tagging protocol.

Cc: John Crispin <john@phrozen.org>
Cc: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-26 14:17:59 -07:00
Vladimir Oltean b1af365637 net: dsa: tag_mtk: use the generic flow dissector procedure
Remove the .flow_dissect procedure, so the flow dissector will call the
generic variant which works for this tagging protocol.

Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: John Crispin <john@phrozen.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-26 14:17:59 -07:00
Vladimir Oltean 742b2e1951 net: dsa: tag_edsa: use the generic flow dissector procedure
Remove the .flow_dissect procedure, so the flow dissector will call the
generic variant which works for this tagging protocol.

Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-26 14:17:59 -07:00
Vladimir Oltean 11f5011189 net: dsa: tag_dsa: use the generic flow dissector procedure
Remove the .flow_dissect procedure, so the flow dissector will call the
generic variant which works for this tagging protocol.

Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-26 14:17:59 -07:00
Vladimir Oltean f569ad5257 net: dsa: tag_brcm: use generic flow dissector procedure
There are 2 Broadcom tags in use, one places the DSA tag before the
Ethernet destination MAC address, and the other before the EtherType.
Nonetheless, both displace the rest of the headers, so this tagger can
use the generic flow dissector procedure which accounts for that.

The ASCII art drawing is a good reference though, so keep it but move it
somewhere else.

Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-26 14:17:59 -07:00
Vladimir Oltean 7a6ffe764b net: dsa: point out the tail taggers
The Marvell 88E6060 uses tag_trailer.c and the KSZ8795, KSZ9477 and
KSZ9893 switches also use tail tags.

Tell that to the DSA core, since this makes a difference for the flow
dissector. Most switches break the parsing of frame headers, but these
ones don't, so no flow dissector adjustment needs to be done for them.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-26 14:17:59 -07:00
Vladimir Oltean 2e8cb1b3db net: dsa: make the .flow_dissect tagger callback return void
There is no tagger that returns anything other than zero, so just change
the return type appropriately.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-26 14:17:59 -07:00
Vladimir Oltean 5124197ce5 net: dsa: tag_ocelot: use a short prefix on both ingress and egress
There are 2 goals that we follow:

- Reduce the header size
- Make the header size equal between RX and TX

The issue that required long prefix on RX was the fact that the ocelot
DSA tag, being put before Ethernet as it is, would overlap with the area
that a DSA master uses for RX filtering (destination MAC address
mainly).

Now that we can ask DSA to put the master in promiscuous mode, in theory
we could remove the prefix altogether and call it a day, but it looks
like we can't. Using no prefix on ingress, some packets (such as ICMP)
would be received, while others (such as PTP) would not be received.
This is because the DSA master we use (enetc) triggers parse errors
("MAC rx frame errors") presumably because it sees Ethernet frames with
a bad length. And indeed, when using no prefix, the EtherType (bytes
12-13 of the frame, bits 96-111) falls over the REW_VAL field from the
extraction header, aka the PTP timestamp.

When turning the short (32-bit) prefix on, the EtherType overlaps with
bits 64-79 of the extraction header, which are a reserved area
transmitted as zero by the switch. The packets are not dropped by the
DSA master with a short prefix. Actually, the frames look like this in
tcpdump (below is a PTP frame, with an extra dsa_8021q tag - dadb 0482 -
added by a downstream sja1105).

89:0c:a9:f2:01:00 > 88:80:00:0a:00:1d, 802.3, length 0: LLC, \
	dsap Unknown (0x10) Individual, ssap ProWay NM (0x0e) Response, \
	ctrl 0x0004: Information, send seq 2, rcv seq 0, \
	Flags [Response], length 78

0x0000:  8880 000a 001d 890c a9f2 0100 0000 100f  ................
0x0010:  0400 0000 0180 c200 000e 001f 7b63 0248  ............{c.H
0x0020:  dadb 0482 88f7 1202 0036 0000 0000 0000  .........6......
0x0030:  0000 0000 0000 0000 0000 001f 7bff fe63  ............{..c
0x0040:  0248 0001 1f81 0500 0000 0000 0000 0000  .H..............
0x0050:  0000 0000 0000 0000 0000 0000            ............

So the short prefix is our new default: we've shortened our RX frames by
12 octets, increased TX by 4, and headers are now equal between RX and
TX. Note that we still need promiscuous mode for the DSA master to not
drop it.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-26 14:17:58 -07:00
Vladimir Oltean 707091eb26 net: dsa: tag_sja1105: request promiscuous mode for master
Currently PTP is broken when ports are in standalone mode (the tagger
keeps printing this message):

sja1105 spi0.1: Expected meta frame, is 01-80-c2-00-00-0e in the DSA master multicast filter?

Sure, one might say "simply add 01-80-c2-00-00-0e to the master's RX
filter" but things become more complicated because:

- Actually all frames in the 01-80-c2-xx-xx-xx and 01-1b-19-xx-xx-xx
  range are trapped to the CPU automatically
- The switch mangles bytes 3 and 4 of the MAC address via the incl_srcpt
  ("include source port [in the DMAC]") option, which is how source port
  and switch id identification is done for link-local traffic on RX. But
  this means that an address installed to the RX filter would, at the
  end of the day, not correspond to the final address seen by the DSA
  master.

Assume RX filtering lists on DSA masters are typically too small to
include all necessary addresses for PTP to work properly on sja1105, and
just request promiscuous mode unconditionally.

Just an example:
Assuming the following addresses are trapped to the CPU:
01-80-c2-00-00-00 to 01-80-c2-00-00-ff
01-1b-19-00-00-00 to 01-1b-19-00-00-ff

These are 512 addresses.
Now let's say this is a board with 3 switches, and 4 ports per switch.
The 512 addresses become 6144 addresses that must be managed by the DSA
master's RX filtering lists.

This may be refined in the future, but for now, it is simply not worth
it to add the additional addresses to the master's RX filter, so simply
request it to become promiscuous as soon as the driver probes.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-26 14:17:58 -07:00
Vladimir Oltean c3975400c8 net: dsa: allow drivers to request promiscuous mode on master
Currently DSA assumes that taggers don't mess with the destination MAC
address of the frames on RX. That is not always the case. Some DSA
headers are placed before the Ethernet header (ocelot), and others
simply mangle random bytes from the destination MAC address (sja1105
with its incl_srcpt option).

Currently the DSA master goes to promiscuous mode automatically when the
slave devices go too (such as when enslaved to a bridge), but in
standalone mode this is a problem that needs to be dealt with.

So give drivers the possibility to signal that their tagging protocol
will get randomly dropped otherwise, and let DSA deal with fixing that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-26 14:17:58 -07:00
Vladimir Oltean e2f9a8fe73 net: mscc: ocelot: always pass skb clone to ocelot_port_add_txtstamp_skb
Currently, ocelot switchdev passes the skb directly to the function that
enqueues it to the list of skb's awaiting a TX timestamp. Whereas the
felix DSA driver first clones the skb, then passes the clone to this
queue.

This matters because in the case of felix, the common IRQ handler, which
is ocelot_get_txtstamp(), currently clones the clone, and frees the
original clone. This is useless and can be simplified by using
skb_complete_tx_timestamp() instead of skb_tstamp_tx().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:47:56 -07:00
Florian Fainelli ed409f3bba net: dsa: b53: Configure VLANs while not filtering
Update the B53 driver to support VLANs while not filtering. This
requires us to enable VLAN globally within the switch upon driver
initial configuration (dev->vlan_enabled).

We also need to remove the code that dealt with PVID re-configuration in
b53_vlan_filtering() since that function worked under the assumption
that it would only be called to make a bridge VLAN filtering, or not
filtering, and we would attempt to move the port's PVID accordingly.

Now that VLANs are programmed all the time, even in the case of a
non-VLAN filtering bridge, we would be programming a default_pvid for
the bridged switch ports.

We need the DSA receive path to pop the VLAN tag if it is the bridge's
default_pvid because the CPU port is always programmed tagged in the
programmed VLANs. In order to do so we utilize the
dsa_untag_bridge_pvid() helper introduced in the commit before within
net/dsa/tag_brcm.c.

Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-23 18:13:45 -07:00
Vladimir Oltean 412a1526d0 net: dsa: untag the bridge pvid from rx skbs
Currently the bridge untags VLANs present in its VLAN groups in
__allowed_ingress() only when VLAN filtering is enabled.

But when a skb is seen on the RX path as tagged with the bridge's pvid,
and that bridge has vlan_filtering=0, and there isn't any 8021q upper
with that VLAN either, then we have a problem. The bridge will not untag
it (since it is supposed to remain VLAN-unaware), and pvid-tagged
communication will be broken.

There are 2 situations where we can end up like that:

1. When installing a pvid in egress-tagged mode, like this:

ip link add dev br0 type bridge vlan_filtering 0
ip link set swp0 master br0
bridge vlan del dev swp0 vid 1
bridge vlan add dev swp0 vid 1 pvid

This happens because DSA configures the VLAN membership of the CPU port
using the same flags as swp0 (in this case "pvid and not untagged"), in
an attempt to copy the frame as-is from ingress to the CPU.

However, in this case, the packet may arrive untagged on ingress, it
will be pvid-tagged by the ingress port, and will be sent as
egress-tagged towards the CPU. Otherwise stated, the CPU will see a VLAN
tag where there was none to speak of on ingress.

When vlan_filtering is 1, this is not a problem, as stated in the first
paragraph, because __allowed_ingress() will pop it. But currently, when
vlan_filtering is 0 and we have such a VLAN configuration, we need an
8021q upper (br0.1) to be able to ping over that VLAN, which is not
symmetrical with the vlan_filtering=1 case, and therefore, confusing for
users.

Basically what DSA attempts to do is simply an approximation: try to
copy the skb with (or without) the same VLAN all the way up to the CPU.
But DSA drivers treat CPU port VLAN membership in various ways (which is
a good segue into situation 2). And some of those drivers simply tell
the CPU port to copy the frame unmodified, which is the golden standard
when it comes to VLAN processing (therefore, any driver which can
configure the hardware to do that, should do that, and discard the VLAN
flags requested by DSA on the CPU port).

2. Some DSA drivers always configure the CPU port as egress-tagged, in
an attempt to recover the classified VLAN from the skb. These drivers
cannot work at all with untagged traffic when bridged in
vlan_filtering=0 mode. And they can't go for the easy "just keep the
pvid as egress-untagged towards the CPU" route, because each front port
can have its own pvid, and that might require conflicting VLAN
membership settings on the CPU port (swp1 is pvid for VID 1 and
egress-tagged for VID 2; swp2 is egress-taggeed for VID 1 and pvid for
VID 2; with this simplistic approach, the CPU port, which is really a
separate hardware entity and has its own VLAN membership settings, would
end up being egress-untagged in both VID 1 and VID 2, therefore losing
the VLAN tags of ingress traffic).

So the only thing we can do is to create a helper function for resolving
the problematic case (that is, a function which untags the bridge pvid
when that is in vlan_filtering=0 mode), which taggers in need should
call. It isn't called from the generic DSA receive path because there
are drivers that fall neither in the first nor second category.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-23 18:13:45 -07:00
David S. Miller 3ab0a7a0c3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Two minor conflicts:

1) net/ipv4/route.c, adding a new local variable while
   moving another local variable and removing it's
   initial assignment.

2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes.
   One pretty prints the port mode differently, whilst another
   changes the driver to try and obtain the port mode from
   the port node rather than the switch node.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-22 16:45:34 -07:00
Vladimir Oltean 88525fc01c net: dsa: tag_sja1105: add compatibility with hwaccel VLAN tags
Check whether there is any hwaccel VLAN tag on RX, and if there is,
treat it as the tag_8021q header.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-20 19:01:34 -07:00
Vladimir Oltean bbed0bbddd net: dsa: tag_8021q: add VLANs to the master interface too
The whole purpose of tag_8021q is to send VLAN-tagged traffic to the
CPU, from which the driver can decode the source port and switch id.

Currently this only works if the VLAN filtering on the master is
disabled. Change that by explicitly adding code to tag_8021q.c to add
the VLANs corresponding to the tags to the filter of the master
interface.

Because we now need to call vlan_vid_add, then we also need to hold the
RTNL mutex. Propagate that requirement to the callers of dsa_8021q_setup
and modify the existing call sites as appropriate. Note that one call
path, sja1105_best_effort_vlan_filtering_set -> sja1105_vlan_filtering
-> sja1105_setup_8021q_tagging, was already holding this lock.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-20 19:01:34 -07:00
Vladimir Oltean 2209158c90 net: dsa: install VLANs into the master's RX filter too
Most DSA switch tags shift the EtherType to the right, causing the
master to not parse the VLAN as VLAN.
However, not all switches do that (example: tail tags, tag_8021q etc),
and if the DSA master has "rx-vlan-filter: on" in ethtool -k, then we
have a problem.

Therefore, we could populate the VLAN table of the master, just in case
(for some switches it will not make a difference), so that network I/O
can work even with a VLAN filtering master.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-20 19:01:34 -07:00
Vladimir Oltean adb256eb17 net: dsa: allow 8021q uppers while the bridge has vlan_filtering=0
When the bridge has VLAN awareness disabled there isn't any duplication
of functionality, since the bridge does not process VLAN. Don't deny
adding 8021q uppers to DSA switch ports in that case. The switch is
supposed to simply pass traffic leaving the VLAN tag as-is, and the
stack will happily strip the VLAN tag for all 8021q uppers that exist.

We need to ensure that there are no 8021q uppers when the user attempts
to enable bridge vlan_filtering.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-20 19:01:34 -07:00
Vladimir Oltean 707ec383b3 net: dsa: refuse configuration in prepare phase of dsa_port_vlan_filtering()
The current logic beats me a little bit. The comment that "bridge skips
-EOPNOTSUPP, so skip the prepare phase" was introduced in commit
fb2dabad69 ("net: dsa: support VLAN filtering switchdev attr").

I'm not sure:
(a) ok, the bridge skips -EOPNOTSUPP, but, so what, where are we
    returning -EOPNOTSUPP?
(b) even if we are, and I'm just not seeing it, what is the causality
    relationship between the bridge skipping -EOPNOTSUPP and DSA
    skipping the prepare phase, and just returning zero?

One thing is certain beyond doubt though, and that is that DSA currently
refuses VLAN filtering from the "commit" phase instead of "prepare", and
that this is not a good thing:

ip link add br0 type bridge
ip link add br1 type bridge vlan_filtering 1
ip link set swp2 master br0
ip link set swp3 master br1
[ 3790.379389] 001: sja1105 spi0.1: VLAN filtering is a global setting
[ 3790.379399] 001: ------------[ cut here ]------------
[ 3790.379403] 001: WARNING: CPU: 1 PID: 515 at net/switchdev/switchdev.c:157 switchdev_port_attr_set_now+0x9c/0xa4
[ 3790.379420] 001: swp3: Commit of attribute (id=6) failed.
[ 3790.379533] 001: [<c11ff588>] (switchdev_port_attr_set_now) from [<c11b62e4>] (nbp_vlan_init+0x84/0x148)
[ 3790.379544] 001: [<c11b62e4>] (nbp_vlan_init) from [<c11a2ff0>] (br_add_if+0x514/0x670)
[ 3790.379554] 001: [<c11a2ff0>] (br_add_if) from [<c1031b5c>] (do_setlink+0x38c/0xab0)
[ 3790.379565] 001: [<c1031b5c>] (do_setlink) from [<c1036fe8>] (__rtnl_newlink+0x44c/0x748)
[ 3790.379573] 001: [<c1036fe8>] (__rtnl_newlink) from [<c1037328>] (rtnl_newlink+0x44/0x60)
[ 3790.379580] 001: [<c1037328>] (rtnl_newlink) from [<c10315fc>] (rtnetlink_rcv_msg+0x124/0x2f8)
[ 3790.379590] 001: [<c10315fc>] (rtnetlink_rcv_msg) from [<c10926b8>] (netlink_rcv_skb+0xb8/0x110)
[ 3790.379806] 001: ---[ end trace 0000000000000002 ]---
[ 3790.379819] 001: sja1105 spi0.1 swp3: failed to initialize vlan filtering on this port

So move the current logic that may fail (except ds->ops->port_vlan_filtering,
that is way harder) into the prepare stage of the switchdev transaction.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-20 19:01:34 -07:00
Vladimir Oltean 1ce39f0ee8 net: dsa: convert denying bridge VLAN with existing 8021q upper to PRECHANGEUPPER
This is checking for the following order of operations, and makes sure
to deny that configuration:

ip link add link swp2 name swp2.100 type vlan id 100
ip link add br0 type bridge vlan_filtering 1
ip link set swp2 master br0
bridge vlan add dev swp2 vid 100

Instead of using vlan_for_each(), which looks at the VLAN filters
installed with vlan_vid_add(), just track the 8021q uppers. This has the
advantage of freeing up the vlan_vid_add() call for actual VLAN
filtering.

There is another change in this patch. The check is moved in slave.c,
from switch.c. I don't think it makes sense to have this 8021q upper
check for each switch port that gets notified of that VLAN addition
(these include DSA links and CPU ports, we know those can't have 8021q
uppers because they don't have a net_device registered for them), so
just do it in slave.c, for that one slave interface.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-20 19:01:33 -07:00
Vladimir Oltean 2b13840672 net: dsa: convert check for 802.1Q upper when bridged into PRECHANGEUPPER
DSA tries to prevent having a VLAN added by a bridge and by an 802.1Q
upper at the same time. It does that by checking the VID in
.ndo_vlan_rx_add_vid(), since that's something that the 8021q module
calls, via vlan_vid_add(). When a VLAN matches in both subsystems, this
check returns -EBUSY.

However the vlan_vid_add() function isn't specific to the 8021q module
in any way at all. It is simply the kernel's way to tell an interface to
add a VLAN to its RX filter and not drop that VLAN. So there's no reason
to return -EBUSY when somebody tries to call vlan_vid_add() for a VLAN
that was installed by the bridge. The proper behavior is to accept that
configuration.

So what's wrong is how DSA checks that it has an 8021q upper. It should
look at the actual uppers for that, not just assume that the 8021q
module was somewhere in the call stack of .ndo_vlan_rx_add_vid().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-20 19:01:33 -07:00
Vladimir Oltean eb46e8da1d net: dsa: rename dsa_slave_upper_vlan_check to something more suggestive
We'll be adding a new check in the PRECHANGEUPPER notifier, where we'll
need to check some VLAN uppers. It is hard to do that when there is
already a function named dsa_slave_upper_vlan_check. So rename this one.

Not to mention that this function probably shouldn't have started with
"dsa_slave_" in the first place, since the struct net_device argument
isn't a DSA slave, but an 8021q upper of one.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-20 19:01:33 -07:00
Vladimir Oltean 8350129930 net: dsa: deny enslaving 802.1Q upper to VLAN-aware bridge from PRECHANGEUPPER
There doesn't seem to be any strong technical reason for doing it this
way, but we'll be adding more checks for invalid upper device
configurations, and it will be easier to have them all grouped under
PRECHANGEUPPER.

Tested that it still works:
ip link set br0 type bridge vlan_filtering 1
ip link add link swp2 name swp2.100 type vlan id 100
ip link set swp2.100 master br0
[   20.321312] br0: port 5(swp2.100) entered blocking state
[   20.326711] br0: port 5(swp2.100) entered disabled state
Error: dsa_core: Cannot enslave VLAN device into VLAN aware bridge.
[   20.346549] br0: port 5(swp2.100) entered blocking state
[   20.351957] br0: port 5(swp2.100) entered disabled state

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-20 19:01:33 -07:00
Andrew Lunn 0f06b855a9 net: dsa: wire up devlink info get
Allow the DSA drivers to implement the devlink call to get info info,
e.g. driver name, firmware version, ASIC ID, etc.

v2:
Combine declaration and the assignment on a single line.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-18 18:18:30 -07:00
Andrew Lunn 97c82c2313 net: dsa: Add devlink regions support to DSA
Allow DSA drivers to make use of devlink regions, via simple wrappers.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-18 18:17:45 -07:00
Andrew Lunn ccc3e6b019 net: dsa: Add helper to convert from devlink to ds
Given a devlink instance, return the dsa switch it is associated to.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-18 18:17:45 -07:00
Vladimir Oltean 6565243c06 net: mscc: ocelot: add locking for the port TX timestamp ID
The ocelot_port->ts_id is used to:
(a) populate skb->cb[0] for matching the TX timestamp in the PTP IRQ
    with an skb.
(b) populate the REW_OP from the injection header of the ongoing skb.
Only then is ocelot_port->ts_id incremented.

This is a problem because, at least theoretically, another timestampable
skb might use the same ocelot_port->ts_id before that is incremented.
Normally all transmit calls are serialized by the netdev transmit
spinlock, but in this case, ocelot_port_add_txtstamp_skb() is also
called by DSA, which has started declaring the NETIF_F_LLTX feature
since commit 2b86cb8299 ("net: dsa: declare lockless TX feature for
slave ports").  So the logic of using and incrementing the timestamp id
should be atomic per port.

The solution is to use the global ocelot_port->ts_id only while
protected by the associated ocelot_port->ts_id_lock. That's where we
populate skb->cb[0]. Note that for ocelot, ocelot_port_add_txtstamp_skb
is called for the actual skb, but for felix, it is called for the skb's
clone. That is something which will also be changed in the future.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-18 13:52:33 -07:00
Vladimir Oltean 88236591ec Revert "net: dsa: Add more convenient functions for installing port VLANs"
This reverts commit 314f76d7a6.

Citing that commit message, the call graph was:

    dsa_slave_vlan_rx_add_vid   dsa_port_setup_8021q_tagging
                |                        |
                |                        |
                |          +-------------+
                |          |
                v          v
               dsa_port_vid_add      dsa_slave_port_obj_add
                      |                         |
                      +-------+         +-------+
                              |         |
                              v         v
                           dsa_port_vlan_add

Now that tag_8021q has its own ops structure, it no longer relies on
dsa_port_vid_add, and therefore on the dsa_switch_ops to install its
VLANs.

So dsa_port_vid_add now only has one single caller. So we can simplify
the call graph to what it was before, aka:

        dsa_slave_vlan_rx_add_vid     dsa_slave_port_obj_add
                      |                         |
                      +-------+         +-------+
                              |         |
                              v         v
                           dsa_port_vlan_add

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-11 17:30:43 -07:00
Vladimir Oltean 5899ee367a net: dsa: tag_8021q: add a context structure
While working on another tag_8021q driver implementation, some things
became apparent:

- It is not mandatory for a DSA driver to offload the tag_8021q VLANs by
  using the VLAN table per se. For example, it can add custom TCAM rules
  that simply encapsulate RX traffic, and redirect & decapsulate rules
  for TX traffic. For such a driver, it makes no sense to receive the
  tag_8021q configuration through the same callback as it receives the
  VLAN configuration from the bridge and the 8021q modules.

- Currently, sja1105 (the only tag_8021q user) sets a
  priv->expect_dsa_8021q variable to distinguish between the bridge
  calling, and tag_8021q calling. That can be improved, to say the
  least.

- The crosschip bridging operations are, in fact, stateful already. The
  list of crosschip_links must be kept by the caller and passed to the
  relevant tag_8021q functions.

So it would be nice if the tag_8021q configuration was more
self-contained. This patch attempts to do that.

Create a struct dsa_8021q_context which encapsulates a struct
dsa_switch, and has 2 function pointers for adding and deleting a VLAN.
These will replace the previous channel to the driver, which was through
the .port_vlan_add and .port_vlan_del callbacks of dsa_switch_ops.

Also put the list of crosschip_links into this dsa_8021q_context.
Drivers that don't support cross-chip bridging can simply omit to
initialize this list, as long as they dont call any cross-chip function.

The sja1105_vlan_add and sja1105_vlan_del functions are refactored into
a smaller sja1105_vlan_add_one, which now has 2 entry points:
- sja1105_vlan_add, from struct dsa_switch_ops
- sja1105_dsa_8021q_vlan_add, from the tag_8021q ops
But even this change is fairly trivial. It just reflects the fact that
for sja1105, the VLANs from these 2 channels end up in the same hardware
table. However that is not necessarily true in the general sense (and
that's the reason for making this change).

The rest of the patch is mostly plain refactoring of "ds" -> "ctx". The
dsa_8021q_context structure needs to be propagated because adding a VLAN
is now done through the ops function pointers inside of it.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-11 17:30:43 -07:00
Vladimir Oltean 7e092af2f3 net: dsa: tag_8021q: setup tagging via a single function call
There is no point in calling dsa_port_setup_8021q_tagging for each
individual port. Additionally, it will become more difficult to do that
when we'll have a context structure to tag_8021q (next patch). So
refactor this now.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-11 17:30:43 -07:00
Vladimir Oltean 2f1e8ea726 net: dsa: link interfaces with the DSA master to get rid of lockdep warnings
Since commit 845e0ebb44 ("net: change addr_list_lock back to static
key"), cascaded DSA setups (DSA switch port as DSA master for another
DSA switch port) are emitting this lockdep warning:

============================================
WARNING: possible recursive locking detected
5.8.0-rc1-00133-g923e4b5032dd-dirty #208 Not tainted
--------------------------------------------
dhcpcd/323 is trying to acquire lock:
ffff000066dd4268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90

but task is already holding lock:
ffff00006608c268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&dsa_master_addr_list_lock_key/1);
  lock(&dsa_master_addr_list_lock_key/1);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by dhcpcd/323:
 #0: ffffdbd1381dda18 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x24/0x30
 #1: ffff00006614b268 (_xmit_ETHER){+...}-{2:2}, at: dev_set_rx_mode+0x28/0x48
 #2: ffff00006608c268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90

stack backtrace:
Call trace:
 dump_backtrace+0x0/0x1e0
 show_stack+0x20/0x30
 dump_stack+0xec/0x158
 __lock_acquire+0xca0/0x2398
 lock_acquire+0xe8/0x440
 _raw_spin_lock_nested+0x64/0x90
 dev_mc_sync+0x44/0x90
 dsa_slave_set_rx_mode+0x34/0x50
 __dev_set_rx_mode+0x60/0xa0
 dev_mc_sync+0x84/0x90
 dsa_slave_set_rx_mode+0x34/0x50
 __dev_set_rx_mode+0x60/0xa0
 dev_set_rx_mode+0x30/0x48
 __dev_open+0x10c/0x180
 __dev_change_flags+0x170/0x1c8
 dev_change_flags+0x2c/0x70
 devinet_ioctl+0x774/0x878
 inet_ioctl+0x348/0x3b0
 sock_do_ioctl+0x50/0x310
 sock_ioctl+0x1f8/0x580
 ksys_ioctl+0xb0/0xf0
 __arm64_sys_ioctl+0x28/0x38
 el0_svc_common.constprop.0+0x7c/0x180
 do_el0_svc+0x2c/0x98
 el0_sync_handler+0x9c/0x1b8
 el0_sync+0x158/0x180

Since DSA never made use of the netdev API for describing links between
upper devices and lower devices, the dev->lower_level value of a DSA
switch interface would be 1, which would warn when it is a DSA master.

We can use netdev_upper_dev_link() to describe the relationship between
a DSA slave and a DSA master. To be precise, a DSA "slave" (switch port)
is an "upper" to a DSA "master" (host port). The relationship is "many
uppers to one lower", like in the case of VLAN. So, for that reason, we
use the same function as VLAN uses.

There might be a chance that somebody will try to take hold of this
interface and use it immediately after register_netdev() and before
netdev_upper_dev_link(). To avoid that, we do the registration and
linkage while holding the RTNL, and we use the RTNL-locked cousin of
register_netdev(), which is register_netdevice().

Since this warning was not there when lockdep was using dynamic keys for
addr_list_lock, we are blaming the lockdep patch itself. The network
stack _has_ been using static lockdep keys before, and it _is_ likely
that stacked DSA setups have been triggering these lockdep warnings
since forever, however I can't test very old kernels on this particular
stacked DSA setup, to ensure I'm not in fact introducing regressions.

Fixes: 845e0ebb44 ("net: change addr_list_lock back to static key")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-08 19:40:09 -07:00
Vladimir Oltean 4349abdb40 net: dsa: don't print non-fatal MTU error if not supported
Commit 72579e14a1 ("net: dsa: don't fail to probe if we couldn't set
the MTU") changed, for some reason, the "err && err != -EOPNOTSUPP"
check into a simple "err". This causes the MTU warning to be printed
even for drivers that don't have the MTU operations implemented.
Fix that.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-07 21:01:50 -07:00
Vladimir Oltean c9ebf126f1 net: dsa: change PHY error message again
slave_dev->name is only populated at this stage if it was specified
through a label in the device tree. However that is not mandatory.
When it isn't, the error message looks like this:

[    5.037057] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d
[    5.044672] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d
[    5.052275] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d
[    5.059877] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d

which is especially confusing since the error gets printed on behalf of
the DSA master (fsl_enetc in this case).

Printing an error message that contains a valid reference to the DSA
port's name is difficult at this point in the initialization stage, so
at least we should print some info that is more reliable, even if less
user-friendly. That may be the driver name and the hardware port index.

After this change, the error is printed as:

[    6.051587] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 0
[    6.061192] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 1
[    6.070765] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 2
[    6.080324] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 3

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-07 21:00:53 -07:00
Gustavo A. R. Silva df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Vladimir Oltean 5df5661a13 net: dsa: stop overriding master's ndo_get_phys_port_name
The purpose of this override is to give the user an indication of what
the number of the CPU port is (in DSA, the CPU port is a hardware
implementation detail and not a network interface capable of traffic).

However, it has always failed (by design) at providing this information
to the user in a reliable fashion.

Prior to commit 3369afba1e ("net: Call into DSA netdevice_ops
wrappers"), the behavior was to only override this callback if it was
not provided by the DSA master.

That was its first failure: if the DSA master itself was a DSA port or a
switchdev, then the user would not see the number of the CPU port in
/sys/class/net/eth0/phys_port_name, but the number of the DSA master
port within its respective physical switch.

But that was actually ok in a way. The commit mentioned above changed
that behavior, and now overrides the master's ndo_get_phys_port_name
unconditionally. That comes with problems of its own, which are worse in
a way.

The idea is that it's typical for switchdev users to have udev rules for
consistent interface naming. These are based, among other things, on
the phys_port_name attribute. If we let the DSA switch at the bottom
to start randomly overriding ndo_get_phys_port_name with its own CPU
port, we basically lose any predictability in interface naming, or even
uniqueness, for that matter.

So, there are reasons to let DSA override the master's callback (to
provide a consistent interface, a number which has a clear meaning and
must not be interpreted according to context), and there are reasons to
not let DSA override it (it breaks udev matching for the DSA master).

But, there is an alternative method for users to retrieve the number of
the CPU port of each DSA switch in the system:

  $ devlink port
  pci/0000:00:00.5/0: type eth netdev swp0 flavour physical port 0
  pci/0000:00:00.5/2: type eth netdev swp2 flavour physical port 2
  pci/0000:00:00.5/4: type notset flavour cpu port 4
  spi/spi2.0/0: type eth netdev sw0p0 flavour physical port 0
  spi/spi2.0/1: type eth netdev sw0p1 flavour physical port 1
  spi/spi2.0/2: type eth netdev sw0p2 flavour physical port 2
  spi/spi2.0/4: type notset flavour cpu port 4
  spi/spi2.1/0: type eth netdev sw1p0 flavour physical port 0
  spi/spi2.1/1: type eth netdev sw1p1 flavour physical port 1
  spi/spi2.1/2: type eth netdev sw1p2 flavour physical port 2
  spi/spi2.1/3: type eth netdev sw1p3 flavour physical port 3
  spi/spi2.1/4: type notset flavour cpu port 4

So remove this duplicated, unreliable and troublesome method. From this
patch on, the phys_port_name attribute of the DSA master will only
contain information about itself (if at all). If the users need reliable
information about the CPU port they're probably using devlink anyway.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23 15:14:58 -07:00
Kurt Kanzenbach 85e05d263e net: dsa: of: Allow ethernet-ports as encapsulating node
Due to unified Ethernet Switch Device Tree Bindings allow for ethernet-ports as
encapsulating node as well.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22 16:56:43 -07:00
Vladimir Oltean 71d4364abd net: dsa: use the ETH_MIN_MTU and ETH_DATA_LEN default values
Now that DSA supports MTU configuration, undo the effects of commit
8b1efc0f83 ("net: remove MTU limits on a few ether_setup callers") and
let DSA interfaces use the default min_mtu and max_mtu specified by
ether_setup(). This is more important for min_mtu: since DSA is
Ethernet, the minimum MTU is the same as of any other Ethernet
interface, and definitely not zero. For the max_mtu, we have a callback
through which drivers can override that, if they want to.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20 18:35:04 -07:00
Florian Fainelli 9c0c7014f3 net: dsa: Setup dsa_netdev_ops
Now that we have all the infrastructure in place for calling into the
dsa_ptr->netdev_ops function pointers, install them when we configure
the DSA CPU/management interface and tear them down. The flow is
unchanged from before, but now we preserve equality of tests when
network device drivers do tests like dev->netdev_ops == &foo_ops which
was not the case before since we were allocating an entirely new
structure.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-20 16:48:22 -07:00
Vladimir Oltean 67c2404922 net: dsa: felix: create a template for the DSA tags on xmit
With this patch we try to kill 2 birds with 1 stone.

First of all, some switches that use tag_ocelot.c don't have the exact
same bitfield layout for the DSA tags. The destination ports field is
different for Seville VSC9953 for example. So the choices are to either
duplicate tag_ocelot.c into a new tag_seville.c (sub-optimal) or somehow
take into account a supposed ocelot->dest_ports_offset when packing this
field into the DSA injection header (again not ideal).

Secondly, tag_ocelot.c already needs to memset a 128-bit area to zero
and call some packing() functions of dubious performance in the
fastpath. And most of the values it needs to pack are pretty much
constant (BYPASS=1, SRC_PORT=CPU, DEST=port index). So it would be good
if we could improve that.

The proposed solution is to allocate a memory area per port at probe
time, initialize that with the statically defined bits as per chip
hardware revision, and just perform a simpler memcpy in the fastpath.

Other alternatives have been analyzed, such as:
- Create a separate tag_seville.c: too much code duplication for just 1
  bit field difference.
- Create a separate DSA_TAG_PROTO_SEVILLE under tag_ocelot.c, just like
  tag_brcm.c, which would have a separate .xmit function. Again, too
  much code duplication for just 1 bit field difference.
- Allocate the template from the init function of the tag_ocelot.c
  module, instead of from the driver: couldn't figure out a method of
  accessing the correct port template corresponding to the correct
  tagger in the .xmit function.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-13 17:40:01 -07:00
Danielle Ratson 71ad8d55f8 devlink: Replace devlink_port_attrs_set parameters with a struct
Currently, devlink_port_attrs_set accepts a long list of parameters,
that most of them are devlink port's attributes.

Use the devlink_port_attrs struct to replace the relevant parameters.

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09 13:15:29 -07:00
Linus Walleij efd7fe68f0 net: dsa: tag_rtl4_a: Implement Realtek 4 byte A tag
This implements the known parts of the Realtek 4 byte
tag protocol version 0xA, as found in the RTL8366RB
DSA switch.

It is designated as protocol version 0xA as a
different Realtek 4 byte tag format with protocol
version 0x9 is known to exist in the Realtek RTL8306
chips.

The tag and switch chip lacks public documentation, so
the tag format has been reverse-engineered from
packet dumps. As only ingress traffic has been available
for analysis an egress tag has not been possible to
develop (even using educated guesses about bit fields)
so this is as far as it gets. It is not known if the
switch even supports egress tagging.

Excessive attempts to figure out the egress tag format
was made. When nothing else worked, I just tried all bit
combinations with 0xannp where a is protocol and p is
port. I looped through all values several times trying
to get a response from ping, without any positive
result.

Using just these ingress tags however, the switch
functionality is vastly improved and the packets find
their way into the destination port without any
tricky VLAN configuration. On the D-Link DIR-685 the
LAN ports now come up and respond to ping without
any command line configuration so this is a real
improvement for users.

Egress packets need to be restricted to the proper
target ports using VLAN, which the RTL8366RB DSA
switch driver already sets up.

Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Mauri Sandberg <sandberg@mailfence.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-08 15:36:19 -07:00
Andrew Lunn 99bac53d06 net: dsa: tag_qca.c: Fix warning for __be16 vs u16
net/dsa/tag_qca.c:48:15: warning: incorrect type in assignment (different base types)
net/dsa/tag_qca.c:48:15:    expected unsigned short [usertype]
net/dsa/tag_qca.c:48:15:    got restricted __be16 [usertype]
net/dsa/tag_qca.c:68:13: warning: incorrect type in assignment (different base types)
net/dsa/tag_qca.c:68:13:    expected restricted __be16 [usertype] hdr
net/dsa/tag_qca.c:68:13:    got int
net/dsa/tag_qca.c:71:16: warning: restricted __be16 degrades to integer
net/dsa/tag_qca.c:81:17: warning: restricted __be16 degrades to integer

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-05 15:31:58 -07:00
Andrew Lunn 04d63f9d91 net: dsa: tag_mtk: Fix warnings for __be16
net/dsa/tag_mtk.c:84:13: warning: incorrect type in assignment (different base types)
net/dsa/tag_mtk.c:84:13:    expected restricted __be16 [usertype] hdr
net/dsa/tag_mtk.c:84:13:    got int
net/dsa/tag_mtk.c:94:17: warning: restricted __be16 degrades to integer

The result of a ntohs() is not __be16, but u16.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-05 15:31:58 -07:00
Andrew Lunn 802734ad77 net: dsa: tag_lan9303: Fix __be16 warnings
net/dsa/tag_lan9303.c:76:24: warning: incorrect type in assignment (different base types)
net/dsa/tag_lan9303.c:76:24:    expected unsigned short [usertype]
net/dsa/tag_lan9303.c:76:24:    got restricted __be16 [usertype]
net/dsa/tag_lan9303.c:80:24: warning: incorrect type in assignment (different base types)
net/dsa/tag_lan9303.c:80:24:    expected unsigned short [usertype]
net/dsa/tag_lan9303.c:80:24:    got restricted __be16 [usertype]
net/dsa/tag_lan9303.c:106:31: warning: restricted __be16 degrades to integer
net/dsa/tag_lan9303.c:111:24: warning: cast to restricted __be16
net/dsa/tag_lan9303.c:111:24: warning: cast to restricted __be16
net/dsa/tag_lan9303.c:111:24: warning: cast to restricted __be16
net/dsa/tag_lan9303.c:111:24: warning: cast to restricted __be16

Make use of __be16 where appropriate to fix these warnings.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-05 15:31:58 -07:00
Andrew Lunn ed6444ea03 net: dsa: tag_ksz: Fix __be16 warnings
cpu_to_be16 returns a __be16 value. So what it is assigned to needs to
have the same type to avoid warnings.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-05 15:31:58 -07:00
Andrew Lunn a61bf20831 net: dsa: Add __percpu property to prevent warnings
net/dsa/slave.c:505:13: warning: incorrect type in initializer (different address spaces)
net/dsa/slave.c:505:13:    expected void const [noderef] <asn:3> *__vpp_verify
net/dsa/slave.c:505:13:    got struct pcpu_sw_netstats *

Add the needed _percpu property to prevent this warning.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-05 15:31:58 -07:00
Florian Fainelli 65951a9eb6 net: dsa: Improve subordinate PHY error message
It is not very informative to know the DSA master device when a
subordinate network device fails to get its PHY setup. Provide the
device name and capitalize PHY while we are it.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-30 12:44:20 -07:00
Daniel Mack 1ed9ec9b08 dsa: Allow forwarding of redirected IGMP traffic
The driver for Marvell switches puts all ports in IGMP snooping mode
which results in all IGMP/MLD frames that ingress on the ports to be
forwarded to the CPU only.

The bridge code in the kernel can then interpret these frames and act
upon them, for instance by updating the mdb in the switch to reflect
multicast memberships of stations connected to the ports. However,
the IGMP/MLD frames must then also be forwarded to other ports of the
bridge so external IGMP queriers can track membership reports, and
external multicast clients can receive query reports from foreign IGMP
queriers.

Currently, this is impossible as the EDSA tagger sets offload_fwd_mark
on the skb when it unwraps the tagged frames, and that will make the
switchdev layer prevent the skb from egressing on any other port of
the same switch.

To fix that, look at the To_CPU code in the DSA header and make
forwarding of the frame possible for trapped IGMP packets.

Introduce some #defines for the frame types to make the code a bit more
comprehensive.

This was tested on a Marvell 88E6352 variant.

Signed-off-by: Daniel Mack <daniel@zonque.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24 14:39:43 -07:00
Linus Torvalds 96144c58ab Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix cfg80211 deadlock, from Johannes Berg.

 2) RXRPC fails to send norigications, from David Howells.

 3) MPTCP RM_ADDR parsing has an off by one pointer error, fix from
    Geliang Tang.

 4) Fix crash when using MSG_PEEK with sockmap, from Anny Hu.

 5) The ucc_geth driver needs __netdev_watchdog_up exported, from
    Valentin Longchamp.

 6) Fix hashtable memory leak in dccp, from Wang Hai.

 7) Fix how nexthops are marked as FDB nexthops, from David Ahern.

 8) Fix mptcp races between shutdown and recvmsg, from Paolo Abeni.

 9) Fix crashes in tipc_disc_rcv(), from Tuong Lien.

10) Fix link speed reporting in iavf driver, from Brett Creeley.

11) When a channel is used for XSK and then reused again later for XSK,
    we forget to clear out the relevant data structures in mlx5 which
    causes all kinds of problems. Fix from Maxim Mikityanskiy.

12) Fix memory leak in genetlink, from Cong Wang.

13) Disallow sockmap attachments to UDP sockets, it simply won't work.
    From Lorenz Bauer.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
  net: ethernet: ti: ale: fix allmulti for nu type ale
  net: ethernet: ti: am65-cpsw-nuss: fix ale parameters init
  net: atm: Remove the error message according to the atomic context
  bpf: Undo internal BPF_PROBE_MEM in BPF insns dump
  libbpf: Support pre-initializing .bss global variables
  tools/bpftool: Fix skeleton codegen
  bpf: Fix memlock accounting for sock_hash
  bpf: sockmap: Don't attach programs to UDP sockets
  bpf: tcp: Recv() should return 0 when the peer socket is closed
  ibmvnic: Flush existing work items before device removal
  genetlink: clean up family attributes allocations
  net: ipa: header pad field only valid for AP->modem endpoint
  net: ipa: program upper nibbles of sequencer type
  net: ipa: fix modem LAN RX endpoint id
  net: ipa: program metadata mask differently
  ionic: add pcie_print_link_status
  rxrpc: Fix race between incoming ACK parser and retransmitter
  net/mlx5: E-Switch, Fix some error pointer dereferences
  net/mlx5: Don't fail driver on failure to create debugfs
  net/mlx5e: CT: Fix ipv6 nat header rewrite actions
  ...
2020-06-13 16:27:13 -07:00
Masahiro Yamada a7f7f6248d treewide: replace '---help---' in Kconfig files with 'help'
Since commit 84af7a6194 ("checkpatch: kconfig: prefer 'help' over
'---help---'"), the number of '---help---' has been gradually
decreasing, but there are still more than 2400 instances.

This commit finishes the conversion. While I touched the lines,
I also fixed the indentation.

There are a variety of indentation styles found.

  a) 4 spaces + '---help---'
  b) 7 spaces + '---help---'
  c) 8 spaces + '---help---'
  d) 1 space + 1 tab + '---help---'
  e) 1 tab + '---help---'    (correct indentation)
  f) 1 tab + 1 space + '---help---'
  g) 1 tab + 2 spaces + '---help---'

In order to convert all of them to 1 tab + 'help', I ran the
following commend:

  $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-14 01:57:21 +09:00
Cong Wang 845e0ebb44 net: change addr_list_lock back to static key
The dynamic key update for addr_list_lock still causes troubles,
for example the following race condition still exists:

CPU 0:				CPU 1:
(RCU read lock)			(RTNL lock)
dev_mc_seq_show()		netdev_update_lockdep_key()
				  -> lockdep_unregister_key()
 -> netif_addr_lock_bh()

because lockdep doesn't provide an API to update it atomically.
Therefore, we have to move it back to static keys and use subclass
for nest locking like before.

In commit 1a33e10e4a ("net: partially revert dynamic lockdep key
changes"), I already reverted most parts of commit ab92d68fc2
("net: core: add generic lockdep keys").

This patch reverts the rest and also part of commit f3b0a18bb6
("net: remove unnecessary variables and callback"). After this
patch, addr_list_lock changes back to using static keys and
subclasses to satisfy lockdep. Thanks to dev->lower_level, we do
not have to change back to ->ndo_get_lock_subclass().

And hopefully this reduces some syzbot lockdep noises too.

Reported-by: syzbot+f3a0e80c34b3fc28ac5e@syzkaller.appspotmail.com
Cc: Taehee Yoo <ap420073@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-09 12:59:45 -07:00
David S. Miller 1806c13dc2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
xdp_umem.c had overlapping changes between the 64-bit math fix
for the calculation of npgs and the removal of the zerocopy
memory type which got rid of the chunk_size_nohdr member.

The mlx5 Kconfig conflict is a case where we just take the
net-next copy of the Kconfig entry dependency as it takes on
the ESWITCH dependency by one level of indirection which is
what the 'net' conflicting change is trying to ensure.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-31 17:48:46 -07:00
Vladimir Oltean 04198499b2 net: dsa: tag_8021q: stop restoring VLANs from bridge
Right now, our only tag_8021q user, sja1105, has the ability to restore
bridge VLANs on its own, so this logic is unnecessary.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-29 16:45:46 -07:00
Vladimir Oltean 2b86cb8299 net: dsa: declare lockless TX feature for slave ports
Be there a platform with the following layout:

      Regular NIC
       |
       +----> DSA master for switch port
               |
               +----> DSA master for another switch port

After changing DSA back to static lockdep class keys in commit
1a33e10e4a ("net: partially revert dynamic lockdep key changes"), this
kernel splat can be seen:

[   13.361198] ============================================
[   13.366524] WARNING: possible recursive locking detected
[   13.371851] 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 Not tainted
[   13.377874] --------------------------------------------
[   13.383201] swapper/0/0 is trying to acquire lock:
[   13.388004] ffff0000668ff298 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[   13.397879]
[   13.397879] but task is already holding lock:
[   13.403727] ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[   13.413593]
[   13.413593] other info that might help us debug this:
[   13.420140]  Possible unsafe locking scenario:
[   13.420140]
[   13.426075]        CPU0
[   13.428523]        ----
[   13.430969]   lock(&dsa_slave_netdev_xmit_lock_key);
[   13.435946]   lock(&dsa_slave_netdev_xmit_lock_key);
[   13.440924]
[   13.440924]  *** DEADLOCK ***
[   13.440924]
[   13.446860]  May be due to missing lock nesting notation
[   13.446860]
[   13.453668] 6 locks held by swapper/0/0:
[   13.457598]  #0: ffff800010003de0 ((&idev->mc_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400
[   13.466593]  #1: ffffd4d3fb478700 (rcu_read_lock){....}-{1:2}, at: mld_sendpack+0x0/0x560
[   13.474803]  #2: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x64/0xb10
[   13.483886]  #3: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
[   13.492793]  #4: ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[   13.503094]  #5: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
[   13.512000]
[   13.512000] stack backtrace:
[   13.516369] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988
[   13.530421] Call trace:
[   13.532871]  dump_backtrace+0x0/0x1d8
[   13.536539]  show_stack+0x24/0x30
[   13.539862]  dump_stack+0xe8/0x150
[   13.543271]  __lock_acquire+0x1030/0x1678
[   13.547290]  lock_acquire+0xf8/0x458
[   13.550873]  _raw_spin_lock+0x44/0x58
[   13.554543]  __dev_queue_xmit+0x84c/0xbe0
[   13.558562]  dev_queue_xmit+0x24/0x30
[   13.562232]  dsa_slave_xmit+0xe0/0x128
[   13.565988]  dev_hard_start_xmit+0xf4/0x448
[   13.570182]  __dev_queue_xmit+0x808/0xbe0
[   13.574200]  dev_queue_xmit+0x24/0x30
[   13.577869]  neigh_resolve_output+0x15c/0x220
[   13.582237]  ip6_finish_output2+0x244/0xb10
[   13.586430]  __ip6_finish_output+0x1dc/0x298
[   13.590709]  ip6_output+0x84/0x358
[   13.594116]  mld_sendpack+0x2bc/0x560
[   13.597786]  mld_ifc_timer_expire+0x210/0x390
[   13.602153]  call_timer_fn+0xcc/0x400
[   13.605822]  run_timer_softirq+0x588/0x6e0
[   13.609927]  __do_softirq+0x118/0x590
[   13.613597]  irq_exit+0x13c/0x148
[   13.616918]  __handle_domain_irq+0x6c/0xc0
[   13.621023]  gic_handle_irq+0x6c/0x160
[   13.624779]  el1_irq+0xbc/0x180
[   13.627927]  cpuidle_enter_state+0xb4/0x4d0
[   13.632120]  cpuidle_enter+0x3c/0x50
[   13.635703]  call_cpuidle+0x44/0x78
[   13.639199]  do_idle+0x228/0x2c8
[   13.642433]  cpu_startup_entry+0x2c/0x48
[   13.646363]  rest_init+0x1ac/0x280
[   13.649773]  arch_call_rest_init+0x14/0x1c
[   13.653878]  start_kernel+0x490/0x4bc

Lockdep keys themselves were added in commit ab92d68fc2 ("net: core:
add generic lockdep keys"), and it's very likely that this splat existed
since then, but I have no real way to check, since this stacked platform
wasn't supported by mainline back then.

>From Taehee's own words:

  This patch was considered that all stackable devices have LLTX flag.
  But the dsa doesn't have LLTX, so this splat happened.
  After this patch, dsa shares the same lockdep class key.
  On the nested dsa interface architecture, which you illustrated,
  the same lockdep class key will be used in __dev_queue_xmit() because
  dsa doesn't have LLTX.
  So that lockdep detects deadlock because the same lockdep class key is
  used recursively although actually the different locks are used.
  There are some ways to fix this problem.

  1. using NETIF_F_LLTX flag.
  If possible, using the LLTX flag is a very clear way for it.
  But I'm so sorry I don't know whether the dsa could have LLTX or not.

  2. using dynamic lockdep again.
  It means that each interface uses a separate lockdep class key.
  So, lockdep will not detect recursive locking.
  But this way has a problem that it could consume lockdep class key
  too many.
  Currently, lockdep can have 8192 lockdep class keys.
   - you can see this number with the following command.
     cat /proc/lockdep_stats
     lock-classes:                         1251 [max: 8192]
     ...
     The [max: 8192] means that the maximum number of lockdep class keys.
  If too many lockdep class keys are registered, lockdep stops to work.
  So, using a dynamic(separated) lockdep class key should be considered
  carefully.
  In addition, updating lockdep class key routine might have to be existing.
  (lockdep_register_key(), lockdep_set_class(), lockdep_unregister_key())

  3. Using lockdep subclass.
  A lockdep class key could have 8 subclasses.
  The different subclass is considered different locks by lockdep
  infrastructure.
  But "lock-classes" is not counted by subclasses.
  So, it could avoid stopping lockdep infrastructure by an overflow of
  lockdep class keys.
  This approach should also have an updating lockdep class key routine.
  (lockdep_set_subclass())

  4. Using nonvalidate lockdep class key.
  The lockdep infrastructure supports nonvalidate lockdep class key type.
  It means this lockdep is not validated by lockdep infrastructure.
  So, the splat will not happen but lockdep couldn't detect real deadlock
  case because lockdep really doesn't validate it.
  I think this should be used for really special cases.
  (lockdep_set_novalidate_class())

Further discussion here:
https://patchwork.ozlabs.org/project/netdev/patch/20200503052220.4536-2-xiyou.wangcong@gmail.com/

There appears to be no negative side-effect to declaring lockless TX for
the DSA virtual interfaces, which means they handle their own locking.
So that's what we do to make the splat go away.

Patch tested in a wide variety of cases: unicast, multicast, PTP, etc.

Fixes: ab92d68fc2 ("net: core: add generic lockdep keys")
Suggested-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27 15:09:28 -07:00
David S. Miller 13209a8f73 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
The MSCC bug fix in 'net' had to be slightly adjusted because the
register accesses are done slightly differently in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24 13:47:27 -07:00
DENG Qingfang 5e5502e012 net: dsa: mt7530: fix roaming from DSA user ports
When a client moves from a DSA user port to a software port in a bridge,
it cannot reach any other clients that connected to the DSA user ports.
That is because SA learning on the CPU port is disabled, so the switch
ignores the client's frames from the CPU port and still thinks it is at
the user port.

Fix it by enabling SA learning on the CPU port.

To prevent the switch from learning from flooding frames from the CPU
port, set skb->offload_fwd_mark to 1 for unicast and broadcast frames,
and let the switch flood them instead of trapping to the CPU port.
Multicast frames still need to be trapped to the CPU port for snooping,
so set the SA_DIS bit of the MTK tag to 1 when transmitting those frames
to disable SA learning.

Fixes: b8f126a8d5 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16 13:49:28 -07:00
Vladimir Oltean fb9f2e9286 net: dsa: tag_sja1105: appease sparse checks for ethertype accessors
A comparison between a value from the packet and an integer constant
value needs to be done by converting the value from the packet from
net->host, or the constant from host->net. Not the other way around.
Even though it makes no practical difference, correct that.

Fixes: 38b5beeae7 ("net: dsa: sja1105: prepare tagger for handling DSA tags and VLAN simultaneously")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-12 18:02:42 -07:00
Vladimir Oltean 84eeb5d460 net: dsa: tag_sja1105: implement sub-VLAN decoding
Create a subvlan_map as part of each port's tagger private structure.
This keeps reverse mappings of bridge-to-dsa_8021q VLAN retagging rules.

Note that as of this patch, this piece of code is never engaged, due to
the fact that the driver hasn't installed any retagging rule, so we'll
always see packets with a subvlan code of 0 (untagged).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-12 13:08:08 -07:00
Vladimir Oltean 3eaae1d05f net: dsa: tag_8021q: support up to 8 VLANs per port using sub-VLANs
For switches that support VLAN retagging, such as sja1105, we extend
dsa_8021q by encoding a "sub-VLAN" into the remaining 3 free bits in the
dsa_8021q tag.

A sub-VLAN is nothing more than a number in the range 0-7, which serves
as an index into a per-port driver lookup table. The sub-VLAN value of
zero means that traffic is untagged (this is also backwards-compatible
with dsa_8021q without retagging).

The switch should be configured to retag VLAN-tagged traffic that gets
transmitted towards the CPU port (and towards the CPU only). Example:

bridge vlan add dev sw1p0 vid 100

The switch retags frames received on port 0, going to the CPU, and
having VID 100, to the VID of 1104 (0x0450). In dsa_8021q language:

 | 11  | 10  |  9  |  8  |  7  |  6  |  5  |  4  |  3  |  2  |  1  |  0  |
 +-----------+-----+-----------------+-----------+-----------------------+
 |    DIR    | SVL |    SWITCH_ID    |  SUBVLAN  |          PORT         |
 +-----------+-----+-----------------+-----------+-----------------------+

0x0450 means:
 - DIR = 0b01: this is an RX VLAN
 - SUBVLAN = 0b001: this is subvlan #1
 - SWITCH_ID = 0b001: this is switch 1 (see the name "sw1p0")
 - PORT = 0b0000: this is port 0 (see the name "sw1p0")

The driver also remembers the "1 -> 100" mapping. In the hotpath, if the
sub-VLAN from the tag encodes a non-untagged frame, this mapping is used
to create a VLAN hwaccel tag, with the value of 100.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-12 13:08:08 -07:00
Vladimir Oltean 38b5beeae7 net: dsa: sja1105: prepare tagger for handling DSA tags and VLAN simultaneously
In VLAN-unaware mode, sja1105 uses VLAN tags with a custom TPID of
0xdadb. While in the yet-to-be introduced best_effort_vlan_filtering
mode, it needs to work with normal VLAN TPID values.

A complication arises when we must transmit a VLAN-tagged packet to the
switch when it's in VLAN-aware mode. We need to construct a packet with
2 VLAN tags, and the switch will use the outer header for routing and
pop it on egress. But sadly, here the 2 hardware generations don't
behave the same:

- E/T switches won't pop an ETH_P_8021AD tag on egress, it seems
  (packets will remain double-tagged).
- P/Q/R/S switches will drop a packet with 2 ETH_P_8021Q tags (it looks
  like it tries to prevent VLAN hopping).

But looks like the reverse is also true:

- E/T switches have no problem popping the outer tag from packets with
  2 ETH_P_8021Q tags.
- P/Q/R/S will have no problem popping a single tag even if that is
  ETH_P_8021AD.

So it is clear that if we want the hardware to work with dsa_8021q
tagging in VLAN-aware mode, we need to send different TPIDs depending on
revision. Keep that information in priv->info->qinq_tpid.

The per-port tagger structure will hold an xmit_tpid value that depends
not only upon the qinq_tpid, but also upon the VLAN awareness state
itself (in case we must transmit using 0xdadb).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-12 13:08:08 -07:00
Vladimir Oltean ec5ae61076 net: dsa: sja1105: save/restore VLANs using a delta commit method
Managing the VLAN table that is present in hardware will become very
difficult once we add a third operating state
(best_effort_vlan_filtering). That is because correct cleanup (not too
little, not too much) becomes virtually impossible, when VLANs can be
added from the bridge layer, from dsa_8021q for basic tagging, for
cross-chip bridging, as well as retagging rules for sub-VLANs and
cross-chip sub-VLANs. So we need to rethink VLAN interaction with the
switch in a more scalable way.

In preparation for that, use the priv->expect_dsa_8021q boolean to
classify any VLAN request received through .port_vlan_add or
.port_vlan_del towards either one of 2 internal lists: bridge VLANs and
dsa_8021q VLANs.

Then, implement a central sja1105_build_vlan_table method that creates a
VLAN configuration from scratch based on the 2 lists of VLANs kept by
the driver, and based on the VLAN awareness state. Currently, if we are
VLAN-unaware, install the dsa_8021q VLANs, otherwise the bridge VLANs.

Then, implement a delta commit procedure that identifies which VLANs
from this new configuration are actually different from the config
previously committed to hardware. We apply the delta through the dynamic
configuration interface (we don't reset the switch). The result is that
the hardware should see the exact sequence of operations as before this
patch.

This also helps remove the "br" argument passed to
dsa_8021q_crosschip_bridge_join, which it was only using to figure out
whether it should commit the configuration back to us or not, based on
the VLAN awareness state of the bridge. We can simplify that, by always
allowing those VLANs inside of our dsa_8021q_vlans list, and committing
those to hardware when necessary.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-12 13:08:08 -07:00
Vladimir Oltean 1f66b0f0ae net: dsa: tag_8021q: introduce a vid_is_dsa_8021q helper
This function returns a boolean denoting whether the VLAN passed as
argument is part of the 1024-3071 range that the dsa_8021q tagging
scheme uses.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-12 13:08:07 -07:00
Russell King 54a0ed0df4 net: dsa: provide an option for drivers to always receive bridge VLANs
DSA assumes that a bridge which has vlan filtering disabled is not
vlan aware, and ignores all vlan configuration. However, the kernel
software bridge code allows configuration in this state.

This causes the kernel's idea of the bridge vlan state and the
hardware state to disagree, so "bridge vlan show" indicates a correct
configuration but the hardware lacks all configuration. Even worse,
enabling vlan filtering on a DSA bridge immediately blocks all traffic
which, given the output of "bridge vlan show", is very confusing.

Provide an option that drivers can set to indicate they want to receive
vlan configuration even when vlan filtering is disabled. At the very
least, this is safe for Marvell DSA bridges, which do not look up
ingress traffic in the VTU if the port is in 8021Q disabled state. It is
also safe for the Ocelot switch family. Whether this change is suitable
for all DSA bridges is not known.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-12 13:08:07 -07:00
Florian Fainelli 097f024454 net: dsa: tag_sja1105: Constify dsa_device_ops
sja1105_netdev_ops should be const since that is what the DSA layer
expects.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-11 16:50:45 -07:00
Florian Fainelli 2fa3888bb7 net: dsa: ocelot: Constify dsa_device_ops
ocelot_netdev_ops should be const since that is what the DSA layer
expects.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-11 16:50:45 -07:00
Vladimir Oltean ac02a451a6 net: dsa: sja1105: implement cross-chip bridging operations
sja1105 uses dsa_8021q for DSA tagging, a format which is VLAN at heart
and which is compatible with cascading. A complete description of this
tagging format is in net/dsa/tag_8021q.c, but a quick summary is that
each external-facing port tags incoming frames with a unique pvid, and
this special VLAN is transmitted as tagged towards the inside of the
system, and as untagged towards the exterior. The tag encodes the switch
id and the source port index.

This means that cross-chip bridging for dsa_8021q only entails adding
the dsa_8021q pvids of one switch to the RX filter of the other
switches. Everything else falls naturally into place, as long as the
bottom-end of ports (the leaves in the tree) is comprised exclusively of
dsa_8021q-compatible (i.e. sja1105 switches). Otherwise, there would be
a chance that a front-panel switch transmits a packet tagged with a
dsa_8021q header, header which it wouldn't be able to remove, and which
would hence "leak" out.

The only use case I tested (due to lack of board availability) was when
the sja1105 switches are part of disjoint trees (however, this doesn't
change the fact that multiple sja1105 switches still need unique switch
identifiers in such a system). But in principle, even "true" single-tree
setups (with DSA links) should work just as fine, except for a small
change which I can't test: dsa_towards_port should be used instead of
dsa_upstream_port (I made the assumption that the routing port that any
sja1105 should use towards its neighbours is the CPU port. That might
not hold true in other setups).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10 19:52:33 -07:00
Vladimir Oltean 3b7bc1f091 net: dsa: introduce a dsa_switch_find function
Somewhat similar to dsa_tree_find, dsa_switch_find returns a dsa_switch
structure pointer by searching for its tree index and switch index (the
parameters from dsa,member). To be used, for example, by drivers who
implement .crosschip_bridge_join and need a reference to the other
switch indicated to by the tree_index and sw_index arguments.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10 19:52:33 -07:00
Vladimir Oltean f66a6a69f9 net: dsa: permit cross-chip bridging between all trees in the system
One way of utilizing DSA is by cascading switches which do not all have
compatible taggers. Consider the following real-life topology:

      +---------------------------------------------------------------+
      | LS1028A                                                       |
      |               +------------------------------+                |
      |               |      DSA master for Felix    |                |
      |               |(internal ENETC port 2: eno2))|                |
      |  +------------+------------------------------+-------------+  |
      |  | Felix embedded L2 switch                                |  |
      |  |                                                         |  |
      |  | +--------------+   +--------------+   +--------------+  |  |
      |  | |DSA master for|   |DSA master for|   |DSA master for|  |  |
      |  | |  SJA1105 1   |   |  SJA1105 2   |   |  SJA1105 3   |  |  |
      |  | |(Felix port 1)|   |(Felix port 2)|   |(Felix port 3)|  |  |
      +--+-+--------------+---+--------------+---+--------------+--+--+

+-----------------------+ +-----------------------+ +-----------------------+
|   SJA1105 switch 1    | |   SJA1105 switch 2    | |   SJA1105 switch 3    |
+-----+-----+-----+-----+ +-----+-----+-----+-----+ +-----+-----+-----+-----+
|sw1p0|sw1p1|sw1p2|sw1p3| |sw2p0|sw2p1|sw2p2|sw2p3| |sw3p0|sw3p1|sw3p2|sw3p3|
+-----+-----+-----+-----+ +-----+-----+-----+-----+ +-----+-----+-----+-----+

The above can be described in the device tree as follows (obviously not
complete):

mscc_felix {
	dsa,member = <0 0>;
	ports {
		port@4 {
			ethernet = <&enetc_port2>;
		};
	};
};

sja1105_switch1 {
	dsa,member = <1 1>;
	ports {
		port@4 {
			ethernet = <&mscc_felix_port1>;
		};
	};
};

sja1105_switch2 {
	dsa,member = <2 2>;
	ports {
		port@4 {
			ethernet = <&mscc_felix_port2>;
		};
	};
};

sja1105_switch3 {
	dsa,member = <3 3>;
	ports {
		port@4 {
			ethernet = <&mscc_felix_port3>;
		};
	};
};

Basically we instantiate one DSA switch tree for every hardware switch
in the system, but we still give them globally unique switch IDs (will
come back to that later). Having 3 disjoint switch trees makes the
tagger drivers "just work", because net devices are registered for the
3 Felix DSA master ports, and they are also DSA slave ports to the ENETC
port. So packets received on the ENETC port are stripped of their
stacked DSA tags one by one.

Currently, hardware bridging between ports on the same sja1105 chip is
possible, but switching between sja1105 ports on different chips is
handled by the software bridge. This is fine, but we can do better.

In fact, the dsa_8021q tag used by sja1105 is compatible with cascading.
In other words, a sja1105 switch can correctly parse and route a packet
containing a dsa_8021q tag. So if we could enable hardware bridging on
the Felix DSA master ports, cross-chip bridging could be completely
offloaded.

Such as system would be used as follows:

ip link add dev br0 type bridge && ip link set dev br0 up
for port in sw0p0 sw0p1 sw0p2 sw0p3 \
	    sw1p0 sw1p1 sw1p2 sw1p3 \
	    sw2p0 sw2p1 sw2p2 sw2p3; do
	ip link set dev $port master br0
done

The above makes switching between ports on the same row be performed in
hardware, and between ports on different rows in software. Now assume
the Felix switch ports are called swp0, swp1, swp2. By running the
following extra commands:

ip link add dev br1 type bridge && ip link set dev br1 up
for port in swp0 swp1 swp2; do
	ip link set dev $port master br1
done

the CPU no longer sees packets which traverse sja1105 switch boundaries
and can be forwarded directly by Felix. The br1 bridge would not be used
for any sort of traffic termination.

For this to work, we need to give drivers an opportunity to listen for
bridging events on DSA trees other than their own, and pass that other
tree index as argument. I have made the assumption, for the moment, that
the other existing DSA notifiers don't need to be broadcast to other
trees. That assumption might turn out to be incorrect. But in the
meantime, introduce a dsa_broadcast function, similar in purpose to
dsa_port_notify, which is used only by the bridging notifiers.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10 19:52:33 -07:00
Eric Dumazet f78ed2204d netpoll: accept NULL np argument in netpoll_send_skb()
netpoll_send_skb() callers seem to leak skb if
the np pointer is NULL. While this should not happen, we
can make the code more robust.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07 18:11:07 -07:00
Jacob Keller c75a33c84b net: remove newlines in NL_SET_ERR_MSG_MOD
The NL_SET_ERR_MSG_MOD macro is used to report a string describing an
error message to userspace via the netlink extended ACK structure. It
should not have a trailing newline.

Add a cocci script which catches cases where the newline marker is
present. Using this script, fix the handful of cases which accidentally
included a trailing new line.

I couldn't figure out a way to get a patch mode working, so this script
only implements context, report, and org.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07 17:56:14 -07:00
Vladimir Oltean e1eea81120 net: dsa: introduce a dsa_port_from_netdev public helper
As its implementation shows, this is synonimous with calling
dsa_slave_dev_check followed by dsa_slave_to_port, so it is quite simple
already and provides functionality which is already there.

However there is now a need for these functions outside dsa_priv.h, for
example in drivers that perform mirroring and redirection through
tc-flower offloads (they are given raw access to the flow_cls_offload
structure), where they need to call this function on act->dev.

But simply exporting dsa_slave_to_port would make it non-inline and
would result in an extra function call in the hotpath, as can be seen
for example in sja1105:

Before:

000006dc <sja1105_xmit>:
{
 6dc:	e92d4ff0 	push	{r4, r5, r6, r7, r8, r9, sl, fp, lr}
 6e0:	e1a04000 	mov	r4, r0
 6e4:	e591958c 	ldr	r9, [r1, #1420]	; 0x58c <- Inline dsa_slave_to_port
 6e8:	e1a05001 	mov	r5, r1
 6ec:	e24dd004 	sub	sp, sp, #4
	u16 tx_vid = dsa_8021q_tx_vid(dp->ds, dp->index);
 6f0:	e1c901d8 	ldrd	r0, [r9, #24]
 6f4:	ebfffffe 	bl	0 <dsa_8021q_tx_vid>
			6f4: R_ARM_CALL	dsa_8021q_tx_vid
	u8 pcp = netdev_txq_to_tc(netdev, queue_mapping);
 6f8:	e1d416b0 	ldrh	r1, [r4, #96]	; 0x60
	u16 tx_vid = dsa_8021q_tx_vid(dp->ds, dp->index);
 6fc:	e1a08000 	mov	r8, r0

After:

000006e4 <sja1105_xmit>:
{
 6e4:	e92d4ff0 	push	{r4, r5, r6, r7, r8, r9, sl, fp, lr}
 6e8:	e1a04000 	mov	r4, r0
 6ec:	e24dd004 	sub	sp, sp, #4
	struct dsa_port *dp = dsa_slave_to_port(netdev);
 6f0:	e1a00001 	mov	r0, r1
{
 6f4:	e1a05001 	mov	r5, r1
	struct dsa_port *dp = dsa_slave_to_port(netdev);
 6f8:	ebfffffe 	bl	0 <dsa_slave_to_port>
			6f8: R_ARM_CALL	dsa_slave_to_port
 6fc:	e1a09000 	mov	r9, r0
	u16 tx_vid = dsa_8021q_tx_vid(dp->ds, dp->index);
 700:	e1c001d8 	ldrd	r0, [r0, #24]
 704:	ebfffffe 	bl	0 <dsa_8021q_tx_vid>
			704: R_ARM_CALL	dsa_8021q_tx_vid

Because we want to avoid possible performance regressions, introduce
this new function which is designed to be public.

Suggested-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07 17:31:57 -07:00
David S. Miller 3793faad7b Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts were all overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 22:10:13 -07:00
Florian Fainelli 050569fc83 net: dsa: Do not leave DSA master with NULL netdev_ops
When ndo_get_phys_port_name() for the CPU port was added we introduced
an early check for when the DSA master network device in
dsa_master_ndo_setup() already implements ndo_get_phys_port_name(). When
we perform the teardown operation in dsa_master_ndo_teardown() we would
not be checking that cpu_dp->orig_ndo_ops was successfully allocated and
non-NULL initialized.

With network device drivers such as virtio_net, this leads to a NPD as
soon as the DSA switch hanging off of it gets torn down because we are
now assigning the virtio_net device's netdev_ops a NULL pointer.

Fixes: da7b9e9b00 ("net: dsa: Add ndo_get_phys_port_name() for CPU port")
Reported-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 17:31:54 -07:00
Vladimir Oltean 657221598f net: dsa: remove duplicate assignment in dsa_slave_add_cls_matchall_mirred
This was caused by a poor merge conflict resolution on my side. The
"act = &cls->rule->action.entries[0];" assignment was already present in
the code prior to the patch mentioned below.

Fixes: e13c207528 ("net: dsa: refactor matchall mirred action to separate function")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 17:30:35 -07:00
Cong Wang 1a33e10e4a net: partially revert dynamic lockdep key changes
This patch reverts the folowing commits:

commit 064ff66e2b
"bonding: add missing netdev_update_lockdep_key()"

commit 53d374979e
"net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()"

commit 1f26c0d3d2
"net: fix kernel-doc warning in <linux/netdevice.h>"

commit ab92d68fc2
"net: core: add generic lockdep keys"

but keeps the addr_list_lock_key because we still lock
addr_list_lock nestedly on stack devices, unlikely xmit_lock
this is safe because we don't take addr_list_lock on any fast
path.

Reported-and-tested-by: syzbot+aaa6fa4949cc5d9b7b25@syzkaller.appspotmail.com
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04 12:05:56 -07:00
Florian Fainelli 86f8b1c01a net: dsa: Do not make user port errors fatal
Prior to 1d27732f41 ("net: dsa: setup and teardown ports"), we would
not treat failures to set-up an user port as fatal, but after this
commit we would, which is a regression for some systems where interfaces
may be declared in the Device Tree, but the underlying hardware may not
be present (pluggable daughter cards for instance).

Fixes: 1d27732f41 ("net: dsa: setup and teardown ports")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04 10:29:47 -07:00
David S. Miller d483389678 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Simple overlapping changes to linux/vermagic.h

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-25 20:18:53 -07:00
Russell King 5c05c1dbb1 net: phylink, dsa: eliminate phylink_fixed_state_cb()
Move the callback into the phylink_config structure, rather than
providing a callback to set this up.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-24 16:45:37 -07:00
Alexander Lobakin e131a56348 net: dsa: add GRO support via gro_cells
gro_cells lib is used by different encapsulating netdevices, such as
geneve, macsec, vxlan etc. to speed up decapsulated traffic processing.
CPU tag is a sort of "encapsulation", and we can use the same mechs to
greatly improve overall DSA performance.
skbs are passed to the GRO layer after removing CPU tags, so we don't
need any new packet offload types as it was firstly proposed by me in
the first GRO-over-DSA variant [1].

The size of struct gro_cells is sizeof(void *), so hot struct
dsa_slave_priv becomes only 4/8 bytes bigger, and all critical fields
remain in one 32-byte cacheline.
The other positive side effect is that drivers for network devices
that can be shipped as CPU ports of DSA-driven switches can now use
napi_gro_frags() to pass skbs to kernel. Packets built that way are
completely non-linear and are likely being dropped without GRO.

This was tested on to-be-mainlined-soon Ethernet driver that uses
napi_gro_frags(), and the overall performance was on par with the
variant from [1], sometimes even better due to minimal overhead.
net.core.gro_normal_batch tuning may help to push it to the limit
on particular setups and platforms.

iperf3 IPoE VLAN NAT TCP forwarding (port1.218 -> port0) setup
on 1.2 GHz MIPS board:

5.7-rc2 baseline:

[ID]  Interval         Transfer     Bitrate        Retr
[ 5]  0.00-120.01 sec  9.00 GBytes  644 Mbits/sec  413  sender
[ 5]  0.00-120.00 sec  8.99 GBytes  644 Mbits/sec       receiver

Iface      RX packets  TX packets
eth0       7097731     7097702
port0      426050      6671829
port1      6671681     425862
port1.218  6671677     425851

With this patch:

[ID]  Interval         Transfer     Bitrate        Retr
[ 5]  0.00-120.01 sec  12.2 GBytes  870 Mbits/sec  122  sender
[ 5]  0.00-120.00 sec  12.2 GBytes  870 Mbits/sec       receiver

Iface      RX packets  TX packets
eth0       9474792     9474777
port0      455200      353288
port1      9019592     455035
port1.218  353144      455024

v2:
 - Add some performance examples in the commit message;
 - No functional changes.

[1] https://lore.kernel.org/netdev/20191230143028.27313-1-alobakin@dlink.ru/

Signed-off-by: Alexander Lobakin <bloodyreaper@yandex.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23 12:32:13 -07:00
Vladimir Oltean 72579e14a1 net: dsa: don't fail to probe if we couldn't set the MTU
There is no reason to fail the probing of the switch if the MTU couldn't
be configured correctly (either the switch port itself, or the host
port) for whatever reason. MTU-sized traffic probably won't work, sure,
but we can still probably limp on and support some form of communication
anyway, which the users would probably appreciate more.

Fixes: bfcb813203 ("net: dsa: configure the MTU for switch ports")
Reported-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:22:59 -07:00
Andrew Lunn 3be98b2d5f net: dsa: Down cpu/dsa ports phylink will control
DSA and CPU ports can be configured in two ways. By default, the
driver should configure such ports to there maximum bandwidth. For
most use cases, this is sufficient. When this default is insufficient,
a phylink instance can be bound to such ports, and phylink will
configure the port, e.g. based on fixed-link properties. phylink
assumes the port is initially down. Given that the driver should have
already configured it to its maximum speed, ask the driver to down
the port before instantiating the phylink instance.

Fixes: 30c4a5b0aa ("net: mv88e6xxx: use resolved link config in mac_link_up()")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-14 16:33:26 -07:00
kbuild test robot bf88dc327d net: dsa: dsa_bridge_mtu_normalization() can be static
Fixes: f41071407c85 ("net: dsa: implement auto-normalization of MTU for bridge hardware datapath")
Signed-off-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-02 06:51:56 -07:00
Russell King 765bda93d0 net: dsa: fix oops while probing Marvell DSA switches
Fix an oops in dsa_port_phylink_mac_change() caused by a combination
of a20f997010 ("net: dsa: Don't instantiate phylink for CPU/DSA
ports unless needed") and the net-dsa-improve-serdes-integration
series of patches 65b7a2c8e3 ("Merge branch
'net-dsa-improve-serdes-integration'").

Unable to handle kernel NULL pointer dereference at virtual address 00000124
pgd = c0004000
[00000124] *pgd=00000000
Internal error: Oops: 805 [#1] SMP ARM
Modules linked in: tag_edsa spi_nor mtd xhci_plat_hcd mv88e6xxx(+) xhci_hcd armada_thermal marvell_cesa dsa_core ehci_orion libdes phy_armada38x_comphy at24 mcp3021 sfp evbug spi_orion sff mdio_i2c
CPU: 1 PID: 214 Comm: irq/55-mv88e6xx Not tainted 5.6.0+ #470
Hardware name: Marvell Armada 380/385 (Device Tree)
PC is at phylink_mac_change+0x10/0x88
LR is at mv88e6352_serdes_irq_status+0x74/0x94 [mv88e6xxx]

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-31 10:09:07 -07:00
Vladimir Oltean 342971766c net: dsa: add port policers
The approach taken to pass the port policer methods on to drivers is
pragmatic. It is similar to the port mirroring implementation (in that
the DSA core does all of the filter block interaction and only passes
simple operations for the driver to implement) and dissimilar to how
flow-based policers are going to be implemented (where the driver has
full control over the flow_cls_offload data structure).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 11:44:00 -07:00
Vladimir Oltean e13c207528 net: dsa: refactor matchall mirred action to separate function
Make room for other actions for the matchall filter by keeping the
mirred argument parsing self-contained in its own function.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 11:44:00 -07:00
Christophe JAILLET ee91a83e08 net: dsa: Simplify 'dsa_tag_protocol_to_str()'
There is no point in preparing the module name in a buffer. The format
string can be passed diectly to 'request_module()'.

This axes a few lines of code and cleans a few things:
   - max len for a driver name is MODULE_NAME_LEN wich is ~ 60 chars,
     not 128. It would be down-sized in 'request_module()'
   - we should pass the total size of the buffer to 'snprintf()', not the
     size minus 1

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 10:54:57 -07:00
Vladimir Oltean bff33f7e2a net: dsa: implement auto-normalization of MTU for bridge hardware datapath
Many switches don't have an explicit knob for configuring the MTU
(maximum transmission unit per interface).  Instead, they do the
length-based packet admission checks on the ingress interface, for
reasons that are easy to understand (why would you accept a packet in
the queuing subsystem if you know you're going to drop it anyway).

So it is actually the MRU that these switches permit configuring.

In Linux there only exists the IFLA_MTU netlink attribute and the
associated dev_set_mtu function. The comments like to play blind and say
that it's changing the "maximum transfer unit", which is to say that
there isn't any directionality in the meaning of the MTU word. So that
is the interpretation that this patch is giving to things: MTU == MRU.

When 2 interfaces having different MTUs are bridged, the bridge driver
MTU auto-adjustment logic kicks in: what br_mtu_auto_adjust() does is it
adjusts the MTU of the bridge net device itself (and not that of the
slave net devices) to the minimum value of all slave interfaces, in
order for forwarded packets to not exceed the MTU regardless of the
interface they are received and send on.

The idea behind this behavior, and why the slave MTUs are not adjusted,
is that normal termination from Linux over the L2 forwarding domain
should happen over the bridge net device, which _is_ properly limited by
the minimum MTU. And termination over individual slave devices is
possible even if those are bridged. But that is not "forwarding", so
there's no reason to do normalization there, since only a single
interface sees that packet.

The problem with those switches that can only control the MRU is with
the offloaded data path, where a packet received on an interface with
MRU 9000 would still be forwarded to an interface with MRU 1500. And the
br_mtu_auto_adjust() function does not really help, since the MTU
configured on the bridge net device is ignored.

In order to enforce the de-facto MTU == MRU rule for these switches, we
need to do MTU normalization, which means: in order for no packet larger
than the MTU configured on this port to be sent, then we need to limit
the MRU on all ports that this packet could possibly come from. AKA
since we are configuring the MRU via MTU, it means that all ports within
a bridge forwarding domain should have the same MTU.

And that is exactly what this patch is trying to do.

>From an implementation perspective, we try to follow the intent of the
user, otherwise there is a risk that we might livelock them (they try to
change the MTU on an already-bridged interface, but we just keep
changing it back in an attempt to keep the MTU normalized). So the MTU
that the bridge is normalized to is either:

 - The most recently changed one:

   ip link set dev swp0 master br0
   ip link set dev swp1 master br0
   ip link set dev swp0 mtu 1400

   This sequence will make swp1 inherit MTU 1400 from swp0.

 - The one of the most recently added interface to the bridge:

   ip link set dev swp0 master br0
   ip link set dev swp1 mtu 1400
   ip link set dev swp1 master br0

   The above sequence will make swp0 inherit MTU 1400 as well.

Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-27 16:07:25 -07:00
Vladimir Oltean bfcb813203 net: dsa: configure the MTU for switch ports
It is useful be able to configure port policers on a switch to accept
frames of various sizes:

- Increase the MTU for better throughput from the default of 1500 if it
  is known that there is no 10/100 Mbps device in the network.
- Decrease the MTU to limit the latency of high-priority frames under
  congestion, or work around various network segments that add extra
  headers to packets which can't be fragmented.

For DSA slave ports, this is mostly a pass-through callback, called
through the regular ndo ops and at probe time (to ensure consistency
across all supported switches).

The CPU port is called with an MTU equal to the largest configured MTU
of the slave ports. The assumption is that the user might want to
sustain a bidirectional conversation with a partner over any switch
port.

The DSA master is configured the same as the CPU port, plus the tagger
overhead. Since the MTU is by definition L2 payload (sans Ethernet
header), it is up to each individual driver to figure out if it needs to
do anything special for its frame tags on the CPU port (it shouldn't
except in special cases). So the MTU does not contain the tagger
overhead on the CPU port.
However the MTU of the DSA master, minus the tagger overhead, is used as
a proxy for the MTU of the CPU port, which does not have a net device.
This is to avoid uselessly calling the .change_mtu function on the CPU
port when nothing should change.

So it is safe to assume that the DSA master and the CPU port MTUs are
apart by exactly the tagger's overhead in bytes.

Some changes were made around dsa_master_set_mtu(), function which was
now removed, for 2 reasons:
  - dev_set_mtu() already calls dev_validate_mtu(), so it's redundant to
    do the same thing in DSA
  - __dev_set_mtu() returns 0 if ops->ndo_change_mtu is an absent method
That is to say, there's no need for this function in DSA, we can safely
call dev_set_mtu() directly, take the rtnl lock when necessary, and just
propagate whatever errors get reported (since the user probably wants to
be informed).

Some inspiration (mainly in the MTU DSA notifier) was taken from a
vaguely similar patch from Murali and Florian, who are credited as
co-developers down below.

Co-developed-by: Murali Krishna Policharla <murali.policharla@broadcom.com>
Signed-off-by: Murali Krishna Policharla <murali.policharla@broadcom.com>
Co-developed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-27 16:07:24 -07:00
David S. Miller 9fb16955fb Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Overlapping header include additions in macsec.c

A bug fix in 'net' overlapping with the removal of 'version'
string in ena_netdev.c

Overlapping test additions in selftests Makefile

Overlapping PCI ID table adjustments in iwlwifi driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25 18:58:11 -07:00
Vladimir Oltean e80f40cbe4 net: dsa: tag_8021q: replace dsa_8021q_remove_header with __skb_vlan_pop
Not only did this wheel did not need reinventing, but there is also
an issue with it: It doesn't remove the VLAN header in a way that
preserves the L2 payload checksum when that is being provided by the DSA
master hw.  It should recalculate checksum both for the push, before
removing the header, and for the pull afterwards. But the current
implementation is quite dizzying, with pulls followed immediately
afterwards by pushes, the memmove is done before the push, etc.  This
makes a DSA master with RX checksumming offload to print stack traces
with the infamous 'hw csum failure' message.

So remove the dsa_8021q_remove_header function and replace it with
something that actually works with inet checksumming.

Fixes: d461933638 ("net: dsa: tag_8021q: Create helper function for removing VLAN header")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-24 16:19:01 -07:00
Florian Fainelli 52015366e3 net: dsa: Implement flow dissection for tag_brcm.c
Provide a flow_dissect callback which returns the network offset and
where to find the skb protocol, given the tags structure a common
function works for both tagging formats that are supported.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23 21:48:59 -07:00
Florian Fainelli 0e62f543be net: dsa: Fix duplicate frames flooded by learning
When both the switch and the bridge are learning about new addresses,
switch ports attached to the bridge would see duplicate ARP frames
because both entities would attempt to send them.

Fixes: 5037d532b8 ("net: dsa: add Broadcom tag RX/TX handler")
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23 21:44:45 -07:00
Jakub Kicinski 53eca1f347 net: rename flow_action_hw_stats_types* -> flow_action_hw_stats*
flow_action_hw_stats_types_check() helper takes one of the
FLOW_ACTION_HW_STATS_*_BIT values as input. If we align
the arguments to the opening bracket of the helper there
is no way to call this helper and stay under 80 characters.

Remove the "types" part from the new flow_action helpers
and enum values.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17 21:12:39 -07:00
Russell King 87615c96e7 net: dsa: warn if phylink_mac_link_state returns error
Issue a warning to the kernel log if phylink_mac_link_state() returns
an error. This should not occur, but let's make it visible.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15 17:11:12 -07:00
David S. Miller 1d34357931 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Minor overlapping changes, nothing serious.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 22:34:48 -07:00
Andrew Lunn a20f997010 net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed
By default, DSA drivers should configure CPU and DSA ports to their
maximum speed. In many configurations this is sufficient to make the
link work.

In some cases it is necessary to configure the link to run slower,
e.g. because of limitations of the SoC it is connected to. Or back to
back PHYs are used and the PHY needs to be driven in order to
establish link. In this case, phylink is used.

Only instantiate phylink if it is required. If there is no PHY, or no
fixed link properties, phylink can upset a link which works in the
default configuration.

Fixes: 0e27921816 ("net: dsa: Use PHYLINK for the CPU/DSA ports")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-11 23:46:11 -07:00
Jiri Pirko 319a1d1947 flow_offload: check for basic action hw stats type
Introduce flow_action_basic_hw_stats_types_check() helper and use it
in drivers. That sanitizes the drivers which do not have support
for action HW stats types.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-08 21:07:48 -07:00
Vladimir Oltean 69df578c5f net: mscc: ocelot: eliminate confusion between CPU and NPI port
Ocelot has the concept of a CPU port. The CPU port is represented in the
forwarding and the queueing system, but it is not a physical device. The
CPU port can either be accessed via register-based injection/extraction
(which is the case of Ocelot), via Frame-DMA (similar to the first one),
or "connected" to a physical Ethernet port (called NPI in the datasheet)
which is the case of the Felix DSA switch.

In Ocelot the CPU port is at index 11.
In Felix the CPU port is at index 6.

The CPU bit is treated special in the forwarding, as it is never cleared
from the forwarding port mask (once added to it). Other than that, it is
treated the same as a normal front port.

Both Felix and Ocelot should use the CPU port in the same way. This
means that Felix should not use the NPI port directly when forwarding to
the CPU, but instead use the CPU port.

This patch is fixing this such that Felix will use port 6 as its CPU
port, and just use the NPI port to carry the traffic.

Therefore, eliminate the "ocelot->cpu" variable which was holding the
index of the NPI port for Felix, and the index of the CPU port module
for Ocelot, so the variable was actually configuring different things
for different drivers and causing at least part of the confusion.

Also remove the "ocelot->num_cpu_ports" variable, which is the result of
another confusion. The 2 CPU ports mentioned in the datasheet are
because there are two frame extraction channels (register based or DMA
based). This is of no relevance to the driver at the moment, and
invisible to the analyzer module.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Suggested-by: Allan W. Nielsen <allan.nielsen@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-04 14:19:00 -08:00
Vladimir Oltean ed11bb1f96 net: dsa: Add bypass operations for the flower classifier-action filter
Due to the immense variety of classification keys and actions available
for tc-flower, as well as due to potentially very different DSA switch
capabilities, it doesn't make a lot of sense for the DSA mid layer to
even attempt to interpret these. So just pass them on to the underlying
switch driver.

DSA implements just the standard boilerplate for binding and unbinding
flow blocks to ports, since nobody wants to deal with that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-03 18:57:49 -08:00
Russell King 8640f8dc6d net: dsa: fix phylink_start()/phylink_stop() calls
Place phylink_start()/phylink_stop() inside dsa_port_enable() and
dsa_port_disable(), which ensures that we call phylink_stop() before
tearing down phylink - which is a documented requirement.  Failure
to do so can cause use-after-free bugs.

Fixes: 0e27921816 ("net: dsa: Use PHYLINK for the CPU/DSA ports")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-03 15:45:49 -08:00
Russell King 5b502a7b29 net: dsa: propagate resolved link config via mac_link_up()
Propagate the resolved link configuration down via DSA's
phylink_mac_link_up() operation to allow split PCS/MAC to work.

Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 12:02:14 -08:00
Russell King 91a208f218 net: phylink: propagate resolved link config via mac_link_up()
Propagate the resolved link parameters via the mac_link_up() call for
MACs that do not automatically track their PCS state. We propagate the
link parameters via function arguments so that inappropriate members
of struct phylink_link_state can't be accessed, and creating a new
structure just for this adds needless complexity to the API.

Tested-by: Andre Przywara <andre.przywara@arm.com>
Tested-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 12:02:14 -08:00
Per Forlin ddc9abaf5d net: dsa: tag_ar9331: Make sure there is headroom for tag
Passing tag size to skb_cow_head will make sure
there is enough headroom for the tag data.
This change does not introduce any overhead in case there
is already available headroom for tag.

Signed-off-by: Per Forlin <perfn@axis.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-14 07:34:51 -08:00
Per Forlin 04fb91243a net: dsa: tag_qca: Make sure there is headroom for tag
Passing tag size to skb_cow_head will make sure
there is enough headroom for the tag data.
This change does not introduce any overhead in case there
is already available headroom for tag.

Signed-off-by: Per Forlin <perfn@axis.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-14 07:34:51 -08:00
Vladimir Oltean 6dc43cd3aa net: dsa: Fix use-after-free in probing of DSA switch tree
DSA sets up a switch tree little by little. Every switch of the N
members of the tree calls dsa_register_switch, and (N - 1) will just
touch the dst->ports list with their ports and quickly exit. Only the
last switch that calls dsa_register_switch will find all DSA links
complete in dsa_tree_setup_routing_table, and not return zero as a
result but instead go ahead and set up the entire DSA switch tree
(practically on behalf of the other switches too).

The trouble is that the (N - 1) switches don't clean up after themselves
after they get an error such as EPROBE_DEFER. Their footprint left in
dst->ports by dsa_switch_touch_ports is still there. And switch N, the
one responsible with actually setting up the tree, is going to work with
those stale dp, dp->ds and dp->ds->dev pointers. In particular ds and
ds->dev might get freed by the device driver.

Be there a 2-switch tree and the following calling order:
- Switch 1 calls dsa_register_switch
  - Calls dsa_switch_touch_ports, populates dst->ports
  - Calls dsa_port_parse_cpu, gets -EPROBE_DEFER, exits.
- Switch 2 calls dsa_register_switch
  - Calls dsa_switch_touch_ports, populates dst->ports
  - Probe doesn't get deferred, so it goes ahead.
  - Calls dsa_tree_setup_routing_table, which returns "complete == true"
    due to Switch 1 having called dsa_switch_touch_ports before.
  - Because the DSA links are complete, it calls dsa_tree_setup_switches
    now.
  - dsa_tree_setup_switches iterates through dst->ports, initializing
    the Switch 1 ds structure (invalid) and the Switch 2 ds structure
    (valid).
  - Undefined behavior (use after free, sometimes NULL pointers, etc).

Real example below (debugging prints added by me, as well as guards
against NULL pointers):

[    5.477947] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803df0b980 (dev ffffff803f775c00)
[    6.313002] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803df0b980 (dev ffffff803f775c00)
[    6.319932] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803df0b980 (dev ffffff803f775c00)
[    6.329693] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803df0b980 (dev ffffff803f775c00)
[    6.339458] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803df0b980 (dev ffffff803f775c00)
[    6.349226] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803df0b980 (dev ffffff803f775c00)
[    6.358991] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803df0b980 (dev ffffff803f775c00)
[    6.368758] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803df0b980 (dev ffffff803f775c00)
[    6.378524] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803df0b980 (dev ffffff803f775c00)
[    6.388291] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803df0b980 (dev ffffff803f775c00)
[    6.398057] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803df0b980 (dev ffffff803f775c00)
[    6.407912] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803da02f80 (dev 0000000000000000)
[    6.417682] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803da02f80 (dev 0000000000000000)
[    6.427446] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803da02f80 (dev 0000000000000000)
[    6.437212] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803da02f80 (dev 0000000000000000)
[    6.446979] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803da02f80 (dev 0000000000000000)
[    6.456744] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803da02f80 (dev 0000000000000000)
[    6.466512] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803da02f80 (dev 0000000000000000)
[    6.476277] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803da02f80 (dev 0000000000000000)
[    6.486043] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803da02f80 (dev 0000000000000000)
[    6.495810] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803da02f80 (dev 0000000000000000)
[    6.505577] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803da02f80 (dev 0000000000000000)
[    6.515433] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[    7.354120] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[    7.361045] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[    7.370805] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[    7.380571] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[    7.390337] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[    7.400104] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[    7.409872] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[    7.419637] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[    7.429403] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[    7.439169] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803db15b80 (dev ffffff803d8e4800)

The solution is to recognize that the functions that call
dsa_switch_touch_ports (dsa_switch_parse_of, dsa_switch_parse) have side
effects, and therefore one should clean up their side effects on error
path. The cleanup of dst->ports was taken from dsa_switch_remove and
moved into a dedicated dsa_switch_release_ports function, which should
really be per-switch (free only the members of dst->ports that are also
members of ds, instead of all switch ports).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27 11:12:46 +01:00
David S. Miller b3f7e3f23a Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net 2020-01-19 22:10:04 +01:00
Alexander Lobakin bd5874da57 net: dsa: tag_qca: fix doubled Tx statistics
DSA subsystem takes care of netdev statistics since commit 4ed70ce9f0
("net: dsa: Refactor transmit path to eliminate duplication"), so
any accounting inside tagger callbacks is redundant and can lead to
messing up the stats.
This bug is present in Qualcomm tagger since day 0.

Fixes: cafdc45c94 ("net-next: dsa: add Qualcomm tag RX/TX handler")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-16 13:59:40 +01:00
Alexander Lobakin ad32205470 net: dsa: tag_gswip: fix typo in tagger name
The correct name is GSWIP (Gigabit Switch IP). Typo was introduced in
875138f81d ("dsa: Move tagger name into its ops structure") while
moving tagger names to their structures.

Fixes: 875138f81d ("dsa: Move tagger name into its ops structure")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-16 13:58:26 +01:00
Florian Fainelli 4d776482ec net: dsa: Get information about stacked DSA protocol
It is possible to stack multiple DSA switches in a way that they are not
part of the tree (disjoint) but the DSA master of a switch is a DSA
slave of another. When that happens switch drivers may have to know this
is the case so as to determine whether their tagging protocol has a
remove chance of working.

This is useful for specific switch drivers such as b53 where devices
have been known to be stacked in the wild without the Broadcom tag
protocol supporting that feature. This allows b53 to continue supporting
those devices by forcing the disabling of Broadcom tags on the outermost
switches if necessary.

The get_tag_protocol() function is therefore updated to gain an
additional enum dsa_tag_protocol argument which denotes the current
tagging protocol used by the DSA master we are attached to, else
DSA_TAG_PROTO_NONE for the top of the dsa_switch_tree.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-08 16:01:13 -08:00
Vladimir Oltean 787cac3f5a net: dsa: Pass pcs_poll flag from driver to PHYLINK
The DSA drivers that implement .phylink_mac_link_state should normally
register an interrupt for the PCS, from which they should call
phylink_mac_change(). However not all switches implement this, and those
who don't should set this flag in dsa_switch in the .setup callback, so
that PHYLINK will poll for a few ms until the in-band AN link timer
expires and the PCS state settles.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-05 23:22:32 -08:00
Vladimir Oltean 2821d50fc0 net: dsa: tag_sja1105: Slightly improve the Xmas tree in sja1105_xmit
This is a cosmetic patch that makes the dp, tx_vid, queue_mapping and
pcp local variable definitions a bit closer in length, so they don't
look like an eyesore as much.

The 'ds' variable is not used otherwise, except for ds->dp.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-05 15:13:13 -08:00
Vladimir Oltean a68578c20a net: dsa: Make deferred_xmit private to sja1105
There are 3 things that are wrong with the DSA deferred xmit mechanism:

1. Its introduction has made the DSA hotpath ever so slightly more
   inefficient for everybody, since DSA_SKB_CB(skb)->deferred_xmit needs
   to be initialized to false for every transmitted frame, in order to
   figure out whether the driver requested deferral or not (a very rare
   occasion, rare even for the only driver that does use this mechanism:
   sja1105). That was necessary to avoid kfree_skb from freeing the skb.

2. Because L2 PTP is a link-local protocol like STP, it requires
   management routes and deferred xmit with this switch. But as opposed
   to STP, the deferred work mechanism needs to schedule the packet
   rather quickly for the TX timstamp to be collected in time and sent
   to user space. But there is no provision for controlling the
   scheduling priority of this deferred xmit workqueue. Too bad this is
   a rather specific requirement for a feature that nobody else uses
   (more below).

3. Perhaps most importantly, it makes the DSA core adhere a bit too
   much to the NXP company-wide policy "Innovate Where It Doesn't
   Matter". The sja1105 is probably the only DSA switch that requires
   some frames sent from the CPU to be routed to the slave port via an
   out-of-band configuration (register write) rather than in-band (DSA
   tag). And there are indeed very good reasons to not want to do that:
   if that out-of-band register is at the other end of a slow bus such
   as SPI, then you limit that Ethernet flow's throughput to effectively
   the throughput of the SPI bus. So hardware vendors should definitely
   not be encouraged to design this way. We do _not_ want more
   widespread use of this mechanism.

Luckily we have a solution for each of the 3 issues:

For 1, we can just remove that variable in the skb->cb and counteract
the effect of kfree_skb with skb_get, much to the same effect. The
advantage, of course, being that anybody who doesn't use deferred xmit
doesn't need to do any extra operation in the hotpath.

For 2, we can create a kernel thread for each port's deferred xmit work.
If the user switch ports are named swp0, swp1, swp2, the kernel threads
will be named swp0_xmit, swp1_xmit, swp2_xmit (there appears to be a 15
character length limit on kernel thread names). With this, the user can
change the scheduling priority with chrt $(pidof swp2_xmit).

For 3, we can actually move the entire implementation to the sja1105
driver.

So this patch deletes the generic implementation from the DSA core and
adds a new one, more adequate to the requirements of PTP TX
timestamping, in sja1105_main.c.

Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-05 15:13:13 -08:00
Vladimir Oltean f685e609a3 net: dsa: Deny PTP on master if switch supports it
It is possible to kill PTP on a DSA switch completely and absolutely,
until a reboot, with a simple command:

tcpdump -i eth2 -j adapter_unsynced

where eth2 is the switch's DSA master.

Why? Well, in short, the PTP API in place today is a bit rudimentary and
relies on applications to retrieve the TX timestamps by polling the
error queue and looking at the cmsg structure. But there is no timestamp
identification of any sorts (except whether it's HW or SW), you don't
know how many more timestamps are there to come, which one is this one,
from whom it is, etc. In other words, the SO_TIMESTAMPING API is
fundamentally limited in that you can get a single HW timestamp from the
stack.

And the "-j adapter_unsynced" flag of tcpdump enables hardware
timestamping.

So let's imagine what happens when the DSA master decides it wants to
deliver TX timestamps to the skb's socket too:
- The timestamp that the user space sees is taken by the DSA master.
  Whereas the RX timestamp will eventually be overwritten by the DSA
  switch. So the RX and TX timestamps will be in different time bases
  (aka garbage).
- The user space applications have no way to deal with the second (real)
  TX timestamp finally delivered by the DSA switch, or even to know to
  wait for it.

Take ptp4l from the linuxptp project, for example. This is its behavior
after running tcpdump, before the patch:

ptp4l[172]: [6469.594] Unexpected data on socket err queue:
ptp4l[172]: [6469.693] rms    8 max   16 freq -21257 +/-  11 delay   748 +/-   0
ptp4l[172]: [6469.711] Unexpected data on socket err queue:
ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 05 00 fd
ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00
ptp4l[172]: [6469.721] Unexpected data on socket err queue:
ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02
ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00
ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 01 c6 b1 00 fd
ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00
ptp4l[172]: [6469.838] Unexpected data on socket err queue:
ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02
ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00
ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 06 00 fd
ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00
ptp4l[172]: [6469.848] Unexpected data on socket err queue:
ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 13 02
ptp4l[172]: 0010 00 36 00 00 02 00 00 00 00 00 00 00 00 00 00 00
ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 04 1a 45 05 7f
ptp4l[172]: 0030 00 00 5e 05 41 32 27 c2 1a 68 00 04 9f ff fe 05
ptp4l[172]: 0040 de 06 00 01
ptp4l[172]: [6469.855] Unexpected data on socket err queue:
ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02
ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00
ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 01 c6 b2 00 fd
ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00
ptp4l[172]: [6469.974] Unexpected data on socket err queue:
ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02
ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00
ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 07 00 fd
ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00

The ptp4l program itself is heavily patched to show this (more details
here [0]). Otherwise, by default it just hangs.

On the other hand, with the DSA patch to disallow HW timestamping
applied:

tcpdump -i eth2 -j adapter_unsynced
tcpdump: SIOCSHWTSTAMP failed: Device or resource busy

So it is a fact of life that PTP timestamping on the DSA master is
incompatible with timestamping on the switch MAC, at least with the
current API. And if the switch supports PTP, taking the timestamps from
the switch MAC is highly preferable anyway, due to the fact that those
don't contain the queuing latencies of the switch. So just disallow PTP
on the DSA master if there is any PTP-capable switch attached.

[0]: https://sourceforge.net/p/linuxptp/mailman/message/36880648/

Fixes: 0336369d3a ("net: dsa: forward hardware timestamping ioctls to switch driver")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-28 11:43:41 -08:00
David S. Miller ac80010fc9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Mere overlapping changes in the conflicts here.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-22 15:15:05 -08:00