In the initial commit dc452a471d ("net: dsa: introduce tagger-owned
storage for private and shared data"), we had a call to
tag_ops->disconnect(dst) issued from dsa_tree_free(), which is called at
tree teardown time.
There were problems with connecting to a switch tree as a whole, so this
got reworked to connecting to individual switches within the tree. In
this process, tag_ops->disconnect(ds) was made to be called only from
switch.c (cross-chip notifiers emitted as a result of dynamic tag proto
changes), but the normal driver teardown code path wasn't replaced with
anything.
Solve this problem by adding a function that does the opposite of
dsa_switch_setup_tag_protocol(), which is called from the equivalent
spot in dsa_switch_teardown(). The positioning here also ensures that we
won't have any use-after-free in tagging protocol (*rcv) ops, since the
teardown sequence is as follows:
dsa_tree_teardown
-> dsa_tree_teardown_master
-> dsa_master_teardown
-> unsets master->dsa_ptr, making no further packets match the
ETH_P_XDSA packet type handler
-> dsa_tree_teardown_ports
-> dsa_port_teardown
-> dsa_slave_destroy
-> unregisters DSA net devices, there is even a synchronize_net()
in unregister_netdevice_many()
-> dsa_tree_teardown_switches
-> dsa_switch_teardown
-> dsa_switch_teardown_tag_protocol
-> finally frees the tagger-owned storage
Fixes: 7f2973149c ("net: dsa: make tagging protocols connect to individual switches from a tree")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Saeed Mahameed <saeed@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20221114143551.1906361-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are multi-generational drivers like mv88e6xxx which have code like
this:
int mv88e6xxx_port_hwtstamp_get(struct dsa_switch *ds, int port,
struct ifreq *ifr)
{
if (!chip->info->ptp_support)
return -EOPNOTSUPP;
...
}
DSA wants to deny PTP timestamping on the master if the switch supports
timestamping too. However it currently relies on the presence of the
port_hwtstamp_get() callback to determine PTP capability, and this
clearly does not work in that case (method is present but returns
-EOPNOTSUPP).
We should not deny PTP on the DSA master for those switches which truly
do not support hardware timestamping.
Create a dsa_port_supports_hwtstamp() method which actually probes for
support by calling port_hwtstamp_get() and seeing whether that returned
-EOPNOTSUPP or not.
Fixes: f685e609a3 ("net: dsa: Deny PTP on master if switch supports it")
Link: https://patchwork.kernel.org/project/netdevbpf/patch/20221110124345.3901389-1-festevam@gmail.com/
Reported-by: Fabio Estevam <festevam@gmail.com>
Reported-by: Steffen Bätz <steffen@innosonix.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Fabio Estevam <festevam@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
DSA tagging protocol drivers can be changed at runtime through sysfs and
at probe time through the device tree (support for the latter was added
later).
When changing through sysfs, it is assumed that the module for the new
tagging protocol was already loaded into the kernel (in fact this is
only a concern for Ocelot/Felix switches, where we have tag_ocelot.ko
and tag_ocelot_8021q.ko; for every other switch, the default and
alternative protocols are compiled within the same .ko, so there is
nothing for the user to load).
The kernel cannot currently call request_module(), because it has no way
of constructing the modalias name of the tagging protocol driver
("dsa_tag-%d", where the number is one of DSA_TAG_PROTO_*_VALUE).
The device tree only contains the string name of the tagging protocol
("ocelot-8021q"), and the only mapping between the string and the
DSA_TAG_PROTO_OCELOT_8021Q_VALUE is present in tag_ocelot_8021q.ko.
So this is a chicken-and-egg situation and dsa_core.ko has nothing based
on which it can automatically request the insertion of the module.
As a consequence, if CONFIG_NET_DSA_TAG_OCELOT_8021Q is built as module,
the switch will forever defer probing.
The long-term solution is to make DSA call request_module() somehow,
but that probably needs some refactoring.
What we can do to keep operating with existing device tree blobs is to
cancel the attempt to change the tagging protocol with the one specified
there, and to remain operating with the default one. Depending on the
situation, the default protocol might still allow some functionality
(in the case of ocelot, it does), and it's better to have that than to
fail to probe.
Fixes: deff710703 ("net: dsa: Allow default tag protocol to be overridden from DT")
Link: https://lore.kernel.org/lkml/20221027113248.420216-1-michael@walle.cc/
Reported-by: Heiko Thiery <heiko.thiery@gmail.com>
Reported-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20221027145439.3086017-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Return zero if both dsa_slave_dev_check() and netdev_uses_dsa() are false.
Fixes: acc43b7bf5 ("net: dsa: allow masters to join a LAG")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix wrong pointer passed to PTR_ERR() in dsa_port_phylink_create() to print
error message.
Fixes: cf5ca4ddc3 ("net: dsa: don't leave dangling pointers in dp->pl when failing")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since dsa_port_devlink_setup() and dsa_port_devlink_teardown() are
already called from code paths which only execute once per port (due to
the existing bool dp->setup), keeping another dp->devlink_port_setup is
redundant, because we can already manage to balance the calls properly
(and not call teardown when setup was never called, or call setup twice,
or things like that).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 3122433eb5 ("net: dsa: Register devlink ports before calling DSA driver setup()")
moved devlink port setup to be done early before driver setup()
was called. That is no longer needed, so move the devlink port
initialization back to dsa_port_setup(), as the first thing done there.
Note there is no longer needed to reinit port as unused if
dsa_port_setup() fails, as it unregisters the devlink port instance on
the error path.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is a desire to simplify the dsa_port registration path with
devlink, and this involves reworking a bit how user ports which fail to
connect to their PHY (because it's missing) get reinitialized as UNUSED
devlink ports.
The desire is for the change to look something like this; basically
dsa_port_setup() has failed, we just change dp->type and call
dsa_port_setup() again.
-/* Destroy the current devlink port, and create a new one which has the UNUSED
- * flavour.
- */
-static int dsa_port_reinit_as_unused(struct dsa_port *dp)
+static int dsa_port_setup_as_unused(struct dsa_port *dp)
{
- dsa_port_devlink_teardown(dp);
dp->type = DSA_PORT_TYPE_UNUSED;
- return dsa_port_devlink_setup(dp);
+ return dsa_port_setup(dp);
}
For an UNUSED port, dsa_port_setup() mostly only calls dsa_port_devlink_setup()
anyway, so we could get away with calling just that. But if we call the
full blown dsa_port_setup(dp) (which will be needed to properly set
dp->setup = true), the callee will have the tendency to go through this
code block too, and call dsa_port_disable(dp):
switch (dp->type) {
case DSA_PORT_TYPE_UNUSED:
dsa_port_disable(dp);
break;
That is not very good, because dsa_port_disable() has this hidden inside
of it:
if (dp->pl)
phylink_stop(dp->pl);
Fact is, we are not prepared to handle a call to dsa_port_disable() with
a struct dsa_port that came from a previous (and failed) call to
dsa_port_setup(). We do not clean up dp->pl, and this will make the
second call to dsa_port_setup() call phylink_stop() on a dangling dp->pl
pointer.
Solve this by creating an API for phylink destruction which is symmetric
to the phylink creation, and never leave dp->pl set to anything except
NULL or a valid phylink structure.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move port_setup() op to be called before devlink_port_register() and
port_teardown() after devlink_port_unregister().
Note it makes sense to move this alongside the rest of the devlink port
code, the reinit() function also gets much nicer, as clearly the fact that
port_setup()->devlink_port_region_create() was called in dsa_port_setup
did not fit the flow.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As pointed out during review, currently the following set of commands
crashes the kernel:
$ ip netns add ns0
$ ip link set swp0 netns ns0
$ ip netns del ns0
WARNING: CPU: 1 PID: 27 at net/core/dev.c:10884 unregister_netdevice_many+0xaa4/0xaec
Workqueue: netns cleanup_net
pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : unregister_netdevice_many+0xaa4/0xaec
lr : unregister_netdevice_many+0x700/0xaec
Call trace:
unregister_netdevice_many+0xaa4/0xaec
default_device_exit_batch+0x294/0x340
ops_exit_list+0xac/0xc4
cleanup_net+0x2e4/0x544
process_one_work+0x4ec/0xb40
---[ end trace 0000000000000000 ]---
unregister_netdevice: waiting for swp0 to become free. Usage count = 2
This is because since DSA user ports, since they started populating
dev->rtnl_link_ops in the blamed commit, gained a different treatment
from default_device_exit_net(), which thinks these interfaces can now be
unregistered.
They can't; so set netns_refund = true to restore the behavior prior to
populating dev->rtnl_link_ops.
Fixes: 95f510d0b7 ("net: dsa: allow the DSA master to be seen and changed through rtnetlink")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220921185428.1767001-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are 2 ways in which a DSA user port may become handled by 2 CPU
ports in a LAG:
(1) its current DSA master joins a LAG
ip link del bond0 && ip link add bond0 type bond mode 802.3ad
ip link set eno2 master bond0
When this happens, all user ports with "eno2" as DSA master get
automatically migrated to "bond0" as DSA master.
(2) it is explicitly configured as such by the user
# Before, the DSA master was eno3
ip link set swp0 type dsa master bond0
The design of this configuration is that the LAG device dynamically
becomes a DSA master through dsa_master_setup() when the first physical
DSA master becomes a LAG slave, and stops being so through
dsa_master_teardown() when the last physical DSA master leaves.
A LAG interface is considered as a valid DSA master only if it contains
existing DSA masters, and no other lower interfaces. Therefore, we
mainly rely on method (1) to enter this configuration.
Each physical DSA master (LAG slave) retains its dev->dsa_ptr for when
it becomes a standalone DSA master again. But the LAG master also has a
dev->dsa_ptr, and this is actually duplicated from one of the physical
LAG slaves, and therefore needs to be balanced when LAG slaves come and
go.
To the switch driver, putting DSA masters in a LAG is seen as putting
their associated CPU ports in a LAG.
We need to prepare cross-chip host FDB notifiers for CPU ports in a LAG,
by calling the driver's ->lag_fdb_add method rather than ->port_fdb_add.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Drivers could refuse to offload a LAG configuration for a variety of
reasons, mainly having to do with its TX type. Additionally, since DSA
masters may now also be LAG interfaces, and this will translate into a
call to port_lag_join on the CPU ports, there may be extra restrictions
there. Propagate the netlink extack to this DSA method in order for
drivers to give a meaningful error message back to the user.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
These don't work (print a harmless error about the operation failing)
and make little sense to have anyway, because when a LAG DSA master goes
away, we will introduce logic to move our CPU port back to the first
physical DSA master. So suppress these device links in preparation for
adding support for LAG DSA masters.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Similar to the discussion about tracking the admin/oper state of LAG DSA
masters, we have the problem here that struct dsa_port *cpu_dp caches a
single pair of orig_ethtool_ops and netdev_ops pointers.
So if we call dsa_master_setup(bond0, cpu_dp) where cpu_dp is also the
dev->dsa_ptr of one of the physical DSA masters, we'd effectively
overwrite what we cached from that physical netdev with what replaced
from the bonding interface.
We don't need DSA ethtool stats on the bonding interface when used as
DSA master, it's good enough to have them just on the physical DSA
masters, so suppress this logic.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We store information about the DSA master's state in
cpu_dp->master_admin_up and cpu_dp->master_oper_up, and this assumes a
bijective association between a CPU port and a DSA master.
However, when we have CPU ports in a LAG (and DSA masters in a LAG too),
the way in which we set up things is that the physical DSA masters still
have dev->dsa_ptr pointing to our cpu_dp, but the bonding/team device
itself also has its dev->dsa_ptr pointing towards one of the CPU port
structures (the first one).
So logically speaking, that first cpu_dp can't keep track of both the
physical master's admin/oper state, and of the bonding master's state.
This isn't even needed; the reason why we keep track of the DSA master's
state is to know when it is available for Ethernet-based register access.
For that use case, we don't even need LAG; we just need to decide upon
one of the physical DSA masters (if there is more than 1 available) and
use that.
This change suppresses dsa_tree_master_{admin,oper}_state_change() calls
on LAG DSA masters (which will be supported in a future change), to
allow the tracking of just physical DSA masters.
Link: https://lore.kernel.org/netdev/628cc94d.1c69fb81.15b0d.422d@mx.google.com/
Suggested-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Some DSA switches have multiple CPU ports, which can be used to improve
CPU termination throughput, but DSA, through dsa_tree_setup_cpu_ports(),
sets up only the first one, leading to suboptimal use of hardware.
The desire is to not change the default configuration but to permit the
user to create a dynamic mapping between individual user ports and the
CPU port that they are served by, configurable through rtnetlink. It is
also intended to permit load balancing between CPU ports, and in that
case, the foreseen model is for the DSA master to be a bonding interface
whose lowers are the physical DSA masters.
To that end, we create a struct rtnl_link_ops for DSA user ports with
the "dsa" kind. We expose the IFLA_DSA_MASTER link attribute that
contains the ifindex of the newly desired DSA master.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
There is a desire to support for DSA masters in a LAG.
That configuration is intended to work by simply enslaving the master to
a bonding/team device. But the physical DSA master (the LAG slave) still
has a dev->dsa_ptr, and that cpu_dp still corresponds to the physical
CPU port.
However, we would like to be able to retrieve the LAG that's the upper
of the physical DSA master. In preparation for that, introduce a helper
called dsa_port_get_master() that replaces all occurrences of the
dp->cpu_dp->master pattern. The distinction between LAG and non-LAG will
be made later within the helper itself.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In case the source port cannot be decoded, print the warning only once. This
still brings attention to the user and does not spam the logs at the same time.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20220830163448.8921-1-kurt@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When a driver returns -EOPNOTSUPP in dsa_port_bridge_join() but failed
to provide a reason for it, DSA attempts to set the extack to say that
software fallback will kick in.
The problem is, when we use brctl and the legacy bridge ioctls, the
extack will be NULL, and DSA dereferences it in the process of setting
it.
Sergei Antonov proves this using the following stack trace:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
PC is at dsa_slave_changeupper+0x5c/0x158
dsa_slave_changeupper from raw_notifier_call_chain+0x38/0x6c
raw_notifier_call_chain from __netdev_upper_dev_link+0x198/0x3b4
__netdev_upper_dev_link from netdev_master_upper_dev_link+0x50/0x78
netdev_master_upper_dev_link from br_add_if+0x430/0x7f4
br_add_if from br_ioctl_stub+0x170/0x530
br_ioctl_stub from br_ioctl_call+0x54/0x7c
br_ioctl_call from dev_ifsioc+0x4e0/0x6bc
dev_ifsioc from dev_ioctl+0x2f8/0x758
dev_ioctl from sock_ioctl+0x5f0/0x674
sock_ioctl from sys_ioctl+0x518/0xe40
sys_ioctl from ret_fast_syscall+0x0/0x1c
Fix the problem by only overriding the extack if non-NULL.
Fixes: 1c6e8088d9 ("net: dsa: allow port_bridge_join() to override extack message")
Link: https://lore.kernel.org/netdev/CABikg9wx7vB5eRDAYtvAm7fprJ09Ta27a4ZazC=NX5K4wn6pWA@mail.gmail.com/
Reported-by: Sergei Antonov <saproj@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Sergei Antonov <saproj@gmail.com>
Link: https://lore.kernel.org/r/20220819173925.3581871-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
More logic will be added to dsa_tree_setup_master() and
dsa_tree_teardown_master() in upcoming changes.
Reduce the indentation by one level in these functions by introducing
and using a dedicated iterator for CPU ports of a tree.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The fact that the tagging protocol is set and queried from the
/sys/class/net/<dsa-master>/dsa/tagging file is a bit of a quirk from
the single CPU port days which isn't aging very well now that DSA can
have more than a single CPU port. This is because the tagging protocol
is a switch property, yet in the presence of multiple CPU ports it can
be queried and set from multiple sysfs files, all of which are handled
by the same implementation.
The current logic ensures that the net device whose sysfs file we're
changing the tagging protocol through must be down. That net device is
the DSA master, and this is fine for single DSA master / CPU port setups.
But exactly because the tagging protocol is per switch [ tree, in fact ]
and not per DSA master, this isn't fine any longer with multiple CPU
ports, and we must iterate through the tree and find all DSA masters,
and make sure that all of them are down.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This is an adaptation of commit c0a8a9c274 ("net: dsa: automatically
bring user ports down when master goes down") for multiple DSA masters.
When a DSA master goes down, only the user ports under its control
should go down too, the others can still send/receive traffic.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
All the traffic to/from a DSA master is supposed to be distributed among
its DSA switch upper interfaces, so we should not allow other upper
device kinds.
An exception to this is DSA_TAG_PROTO_NONE (switches with no DSA tags),
and in that case it is actually expected to create e.g. VLAN interfaces
on the master. But for those, netdev_uses_dsa(master) returns false, so
the restriction doesn't apply.
The motivation for this change is to allow LAG interfaces of DSA masters
to be DSA masters themselves. We want to restrict the user's degrees of
freedom by 1: the LAG should already have all DSA masters as lowers, and
while lower ports of the LAG can be removed, none can be added after the
fact.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When DSA gains support for multiple CPU ports in a LAG, it will become
mandatory to monitor the changeupper events for the DSA master.
In fact, there are already some restrictions to be imposed in that area,
namely that a DSA master cannot be a bridge port except in some special
circumstances.
Centralize the restrictions at the level of the DSA layer as a
preliminary step.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
dsa_slave_prechangeupper_sanity_check() is supposed to enforce some
adjacency restrictions, and calls ds->ops->port_prechangeupper if the
driver implements it.
We convert the error code from the port_prechangeupper() call to a
notifier code, and 0 is converted to NOTIFY_OK, but the caller of
dsa_slave_prechangeupper_sanity_check() stops at any notifier code
different from NOTIFY_DONE.
Avoid this by converting back the notifier code to an error code, so
that both NOTIFY_OK and NOTIFY_DONE will be seen as 0. This allows more
parallel sanity check functions to be added.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Traditionally, DSA has had a single netdev notifier handling function
for each device type.
For the sake of code cleanliness, we would like to introduce more
handling functions which do one thing, but the conditions for entering
these functions start to overlap. Example: a handling function which
tracks whether any bridges contain both DSA and non-DSA interfaces.
Either this is placed before dsa_slave_changeupper(), case in which it
will prevent that function from executing, or we place it after
dsa_slave_changeupper(), case in which we will prevent it from
executing. The other alternative is to ignore errors from the new
handling function (not ideal).
To support this usage, we need to change the pattern. In the new model,
we enter all notifier handling sub-functions, and exit with NOTIFY_DONE
if there is nothing to do. This allows the sub-functions to be
relatively free-form and independent from each other.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Since commit 129bd7ca8a ("net: dsa: Prevent usage of NET_DSA_TAG_8021Q
as tagging protocol"), dsa_8021q_netdev_ops no longer exists, so remove
the comment that talks about it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220818143808.2808393-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Early DSA drivers were kind of simplistic in that they assumed a fairly
narrow hardware layout. User ports would have integrated PHYs at an
internal MDIO address that is derivable from the port number, and shared
(DSA and CPU) ports would have an MII-style (serial or parallel)
connection to another MAC. Phylib and then phylink were used to drive
the internal PHYs, and this needed little to no description through the
platform data structures. Bringing up the shared ports at the maximum
supported link speed was the responsibility of the drivers.
As a result of this, when these early drivers were converted from
platform data to the new DSA OF bindings, there was no link information
translated into the first DT bindings.
https://lore.kernel.org/all/YtXFtTsf++AeDm1l@lunn.ch/
Later, phylink was adopted for shared ports as well, and today we have a
workaround in place, introduced by commit a20f997010 ("net: dsa: Don't
instantiate phylink for CPU/DSA ports unless needed"). There, DSA checks
for the presence of phy-handle/fixed-link/managed OF properties, and if
missing, phylink registration would be skipped. This is because phylink
is optional for some drivers (the shared ports already work without it),
but the process of starting to register a port with phylink is
irreversible: if phylink_create() fails to find the fwnode properties it
needs, it bails out and it leaves the ports inoperational (because
phylink expects ports to be initially down, so DSA necessarily takes
them down, and doesn't know how to put them back up again).
DSA being a common framework, new drivers opt into this workaround
willy-nilly, but the ideal behavior from the DSA core's side would have
been to not interfere with phylink's process of failing at all. This
isn't possible because of regression concerns with pre-phylink DT blobs,
but at least DSA should put a stop to the proliferation of more of such
cases that rely on the workaround to skip phylink registration, and
sanitize the environment that new drivers work in.
To that end, create a list of compatible strings for which the
workaround is preserved, and don't apply the workaround for any drivers
outside that list (this includes new drivers).
In some cases, we make the assumption that even existing drivers don't
rely on DSA's workaround, and we do this by looking at the device trees
in which they appear. We can't fully know what is the situation with
downstream DT blobs, but we can guess the overall trend by studying the
DT blobs that were submitted upstream. If there are upstream blobs that
have lacking descriptions, we take it as very likely that there are many
more downstream blobs that do so too. If all upstream blobs have
complete descriptions, we take that as a hint that the driver is a
candidate for enforcing strict DT bindings (considering that most
bindings are copy-pasted). If there are no upstream DT blobs, we take
the conservative route of allowing the workaround, unless the driver
maintainer instructs us otherwise.
The driver situation is as follows:
ar9331
~~~~~~
compatible strings:
- qca,ar9331-switch
1 occurrence in mainline device trees, part of SoC dtsi
(arch/mips/boot/dts/qca/ar9331.dtsi), description is not problematic.
Verdict: opt into strict DT bindings and out of workarounds.
b53
~~~
compatible strings:
- brcm,bcm5325
- brcm,bcm53115
- brcm,bcm53125
- brcm,bcm53128
- brcm,bcm5365
- brcm,bcm5389
- brcm,bcm5395
- brcm,bcm5397
- brcm,bcm5398
- brcm,bcm53010-srab
- brcm,bcm53011-srab
- brcm,bcm53012-srab
- brcm,bcm53018-srab
- brcm,bcm53019-srab
- brcm,bcm5301x-srab
- brcm,bcm11360-srab
- brcm,bcm58522-srab
- brcm,bcm58525-srab
- brcm,bcm58535-srab
- brcm,bcm58622-srab
- brcm,bcm58623-srab
- brcm,bcm58625-srab
- brcm,bcm88312-srab
- brcm,cygnus-srab
- brcm,nsp-srab
- brcm,omega-srab
- brcm,bcm3384-switch
- brcm,bcm6328-switch
- brcm,bcm6368-switch
- brcm,bcm63xx-switch
I've found at least these mainline DT blobs with problems:
arch/arm/boot/dts/bcm47094-linksys-panamera.dts
- lacks phy-mode
arch/arm/boot/dts/bcm47189-tenda-ac9.dts
- lacks phy-mode and fixed-link
arch/arm/boot/dts/bcm47081-luxul-xap-1410.dts
arch/arm/boot/dts/bcm47081-luxul-xwr-1200.dts
arch/arm/boot/dts/bcm47081-buffalo-wzr-600dhp2.dts
- lacks phy-mode and fixed-link
arch/arm/boot/dts/bcm47094-luxul-xbr-4500.dts
arch/arm/boot/dts/bcm4708-smartrg-sr400ac.dts
arch/arm/boot/dts/bcm4708-luxul-xap-1510.dts
arch/arm/boot/dts/bcm953012er.dts
arch/arm/boot/dts/bcm4708-netgear-r6250.dts
arch/arm/boot/dts/bcm4708-buffalo-wzr-1166dhp-common.dtsi
arch/arm/boot/dts/bcm4708-luxul-xwc-1000.dts
arch/arm/boot/dts/bcm47094-luxul-abr-4500.dts
- lacks phy-mode and fixed-link
arch/arm/boot/dts/bcm53016-meraki-mr32.dts
- lacks phy-mode
Verdict: opt into DSA workarounds.
bcm_sf2
~~~~~~~
compatible strings:
- brcm,bcm4908-switch
- brcm,bcm7445-switch-v4.0
- brcm,bcm7278-switch-v4.0
- brcm,bcm7278-switch-v4.8
A single occurrence in mainline
(arch/arm64/boot/dts/broadcom/bcm4908/bcm4908.dtsi), part of a SoC
dtsi, valid description. Florian Fainelli explains that most of the
bcm_sf2 device trees lack a full description for the internal IMP
ports.
Verdict: opt the BCM4908 into strict DT bindings, and opt the rest
into the workarounds. Note that even though BCM4908 has strict DT
bindings, it still does not register with phylink on the IMP port
due to it implementing ->adjust_link().
hellcreek
~~~~~~~~~
compatible strings:
- hirschmann,hellcreek-de1soc-r1
No occurrence in mainline device trees. Kurt Kanzenbach explains
that the downstream device trees lacked phy-mode and fixed link, and
needed work, but were fixed in the meantime.
Verdict: opt into strict DT bindings and out of workarounds.
lan9303
~~~~~~~
compatible strings:
- smsc,lan9303-mdio
- smsc,lan9303-i2c
1 occurrence in mainline device trees:
arch/arm/boot/dts/imx53-kp-hsc.dts
- no phy-mode, no fixed-link
Verdict: opt out of strict DT bindings and into workarounds.
lantiq_gswip
~~~~~~~~~~~~
compatible strings:
- lantiq,xrx200-gswip
- lantiq,xrx300-gswip
- lantiq,xrx330-gswip
No occurrences in mainline device trees. Martin Blumenstingl
confirms that the downstream OpenWrt device trees lack a proper
fixed-link and need work, and that the incomplete description can
even be seen in the example from
Documentation/devicetree/bindings/net/dsa/lantiq-gswip.txt.
Verdict: opt out of strict DT bindings and into workarounds.
microchip ksz
~~~~~~~~~~~~~
compatible strings:
- microchip,ksz8765
- microchip,ksz8794
- microchip,ksz8795
- microchip,ksz8863
- microchip,ksz8873
- microchip,ksz9477
- microchip,ksz9897
- microchip,ksz9893
- microchip,ksz9563
- microchip,ksz8563
- microchip,ksz9567
- microchip,lan9370
- microchip,lan9371
- microchip,lan9372
- microchip,lan9373
- microchip,lan9374
5 occurrences in mainline device trees, all descriptions are valid.
But we had a snafu for the ksz8795 and ksz9477 drivers where the
phy-mode property would be expected to be located directly under the
'switch' node rather than under a port OF node. It was fixed by
commit edecfa98f6 ("net: dsa: microchip: look for phy-mode in port
nodes"). The driver still has compatibility with the old DT blobs.
The lan937x support was added later than the above snafu was fixed,
and even though it has support for the broken DT blobs by virtue of
sharing a common probing function, I'll take it that its DT blobs
are correct.
Verdict: opt lan937x into strict DT bindings, and the others out.
mt7530
~~~~~~
compatible strings
- mediatek,mt7621
- mediatek,mt7530
- mediatek,mt7531
Multiple occurrences in mainline device trees, one is part of an SoC
dtsi (arch/mips/boot/dts/ralink/mt7621.dtsi), all descriptions are fine.
Verdict: opt into strict DT bindings and out of workarounds.
mv88e6060
~~~~~~~~~
compatible string:
- marvell,mv88e6060
no occurrences in mainline, nobody knows anybody who uses it.
Verdict: opt out of strict DT bindings and into workarounds.
mv88e6xxx
~~~~~~~~~
compatible strings:
- marvell,mv88e6085
- marvell,mv88e6190
- marvell,mv88e6250
Device trees that have incomplete descriptions of CPU or DSA ports:
arch/arm64/boot/dts/freescale/imx8mq-zii-ultra.dtsi
- lacks phy-mode
arch/arm64/boot/dts/marvell/cn9130-crb.dtsi
- lacks phy-mode and fixed-link
arch/arm/boot/dts/vf610-zii-ssmb-spu3.dts
- lacks phy-mode
arch/arm/boot/dts/kirkwood-mv88f6281gtw-ge.dts
- lacks phy-mode
arch/arm/boot/dts/vf610-zii-spb4.dts
- lacks phy-mode
arch/arm/boot/dts/vf610-zii-cfu1.dts
- lacks phy-mode
arch/arm/boot/dts/vf610-zii-dev-rev-c.dts
- lacks phy-mode on CPU port, fixed-link on DSA ports
arch/arm/boot/dts/vf610-zii-dev-rev-b.dts
- lacks phy-mode on CPU port
arch/arm/boot/dts/armada-381-netgear-gs110emx.dts
- lacks phy-mode
arch/arm/boot/dts/vf610-zii-scu4-aib.dts
- lacks fixed-link on xgmii DSA ports and/or in-band-status on
2500base-x DSA ports, and phy-mode on CPU port
arch/arm/boot/dts/imx6qdl-gw5904.dtsi
- lacks phy-mode and fixed-link
arch/arm/boot/dts/armada-385-clearfog-gtr-l8.dts
- lacks phy-mode and fixed-link
arch/arm/boot/dts/vf610-zii-ssmb-dtu.dts
- lacks phy-mode
arch/arm/boot/dts/kirkwood-dir665.dts
- lacks phy-mode
arch/arm/boot/dts/kirkwood-rd88f6281.dtsi
- lacks phy-mode
arch/arm/boot/dts/orion5x-netgear-wnr854t.dts
- lacks phy-mode and fixed-link
arch/arm/boot/dts/armada-388-clearfog.dts
- lacks phy-mode
arch/arm/boot/dts/armada-xp-linksys-mamba.dts
- lacks phy-mode
arch/arm/boot/dts/armada-385-linksys.dtsi
- lacks phy-mode
arch/arm/boot/dts/imx6q-b450v3.dts
arch/arm/boot/dts/imx6q-b850v3.dts
- has a phy-handle but not a phy-mode?
arch/arm/boot/dts/armada-370-rd.dts
- lacks phy-mode
arch/arm/boot/dts/kirkwood-linksys-viper.dts
- lacks phy-mode
arch/arm/boot/dts/imx51-zii-rdu1.dts
- lacks phy-mode
arch/arm/boot/dts/imx51-zii-scu2-mezz.dts
- lacks phy-mode
arch/arm/boot/dts/imx6qdl-zii-rdu2.dtsi
- lacks phy-mode
arch/arm/boot/dts/armada-385-clearfog-gtr-s4.dts
- lacks phy-mode and fixed-link
Verdict: opt out of strict DT bindings and into workarounds.
ocelot
~~~~~~
compatible strings:
- mscc,vsc9953-switch
- felix (arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi) is a PCI
device, has no compatible string
2 occurrences in mainline, both are part of SoC dtsi and complete.
Verdict: opt into strict DT bindings and out of workarounds.
qca8k
~~~~~
compatible strings:
- qca,qca8327
- qca,qca8328
- qca,qca8334
- qca,qca8337
5 occurrences in mainline device trees, none of the descriptions are
problematic.
Verdict: opt into strict DT bindings and out of workarounds.
realtek
~~~~~~~
compatible strings:
- realtek,rtl8366rb
- realtek,rtl8365mb
2 occurrences in mainline, both descriptions are fine, additionally
rtl8365mb.c has a comment "The device tree firmware should also
specify the link partner of the extension port - either via a
fixed-link or other phy-handle."
Verdict: opt into strict DT bindings and out of workarounds.
rzn1_a5psw
~~~~~~~~~~
compatible strings:
- renesas,rzn1-a5psw
One single occurrence, part of SoC dtsi
(arch/arm/boot/dts/r9a06g032.dtsi), description is fine.
Verdict: opt into strict DT bindings and out of workarounds.
sja1105
~~~~~~~
Driver already validates its port OF nodes in
sja1105_parse_ports_node().
Verdict: opt into strict DT bindings and out of workarounds.
vsc73xx
~~~~~~~
compatible strings:
- vitesse,vsc7385
- vitesse,vsc7388
- vitesse,vsc7395
- vitesse,vsc7398
2 occurrences in mainline device trees, both descriptions are fine.
Verdict: opt into strict DT bindings and out of workarounds.
xrs700x
~~~~~~~
compatible strings:
- arrow,xrs7003e
- arrow,xrs7003f
- arrow,xrs7004e
- arrow,xrs7004f
no occurrences in mainline, we don't know.
Verdict: opt out of strict DT bindings and into workarounds.
Because there is a pattern where newly added switches reuse existing
drivers more often than introducing new ones, I've opted for deciding
who gets to opt into the workaround based on an OF compatible match
table in the DSA core. The alternative would have been to add another
boolean property to struct dsa_switch, like configure_vlan_while_not_filtering.
But this avoids situations where sometimes driver maintainers obfuscate
what goes on by sharing a common probing function, and therefore making
new switches inherit old quirks.
Side note, we also warn about missing properties for drivers that rely
on the workaround. This isn't an indication that we'll break
compatibility with those DT blobs any time soon, but is rather done to
raise awareness about the change, for future DT blob authors.
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Acked-by: Alvin Šipraga <alsi@bang-olufsen.dk> # realtek
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is a subset of functions that applies only to shared (DSA and CPU)
ports, yet this is difficult to comprehend by looking at their code alone.
These are dsa_port_link_register_of(), dsa_port_link_unregister_of(),
and the functions that only these 2 call.
Rename this class of functions to dsa_shared_port_* to make this fact
more evident, even if this goes against the apparent convention that
function names in port.c must start with dsa_port_.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
dsa_port_link_register_of() and dsa_port_link_unregister_of() are not
written with the fact in mind that they can be called with a dp->dn that
is NULL (as evidenced even by the _of suffix in their name), but this is
exactly what happens.
How this behaves will differ depending on whether the backing driver
implements ->adjust_link() or not.
If it doesn't, the "if (of_phy_is_fixed_link(dp->dn) || phy_np)"
condition will return false, and dsa_port_link_register_of() will do
nothing and return 0.
If the driver does implement ->adjust_link(), the
"if (of_phy_is_fixed_link(dp->dn))" condition will return false
(dp->dn is NULL) and we will call dsa_port_setup_phy_of(). This will
call dsa_port_get_phy_device(), which will also return NULL, and we will
also do nothing and return 0.
It is hard to maintain this code and make future changes to it in this
state, so just suppress calls to these 2 functions if dp->dn is NULL.
The only functional effect is that if the driver does implement
->adjust_link(), we'll stop printing this to the console:
Using legacy PHYLIB callbacks. Please migrate to PHYLINK!
but instead we'll always print:
[ 8.539848] dsa-loop fixed-0:1f: skipping link registration for CPU port 5
This is for the better anyway, since "using legacy phylib callbacks"
was misleading information - we weren't issuing _any_ callbacks due to
dsa_port_get_phy_device() returning NULL.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ds->ops->port_stp_state_set() is, like most DSA methods, optional, and
if absent, the port is supposed to remain in the forwarding state (as
standalone). Such is the case with the mv88e6060 driver, which does not
offload the bridge layer. DSA warns that the STP state can't be changed
to FORWARDING as part of dsa_port_enable_rt(), when in fact it should not.
The error message is also not up to modern standards, so take the
opportunity to make it more descriptive.
Fixes: fd36454131 ("net: dsa: change scope of STP state setter")
Reported-by: Sergei Antonov <saproj@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Sergei Antonov <saproj@gmail.com>
Link: https://lore.kernel.org/r/20220816201445.1809483-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Due to an invalid conflict resolution on my side while working on 2
different series (LAG FDBs and FDB isolation), dsa_switch_do_lag_fdb_add()
does not store the database associated with a dsa_mac_addr structure.
So after adding an FDB entry associated with a LAG, dsa_mac_addr_find()
fails to find it while deleting it, because &a->db is zeroized memory
for all stored FDB entries of lag->fdbs, and dsa_switch_do_lag_fdb_del()
returns -ENOENT rather than deleting the entry.
Fixes: c26933639b ("net: dsa: request drivers to perform FDB isolation")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220723012411.1125066-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The "ds" iterator variable used in dsa_port_reset_vlan_filtering() ->
dsa_switch_for_each_port() overwrites the "dp" received as argument,
which is later used to call dsa_port_vlan_filtering() proper.
As a result, switches which do enter that code path (the ones with
vlan_filtering_is_global=true) will dereference an invalid dp in
dsa_port_reset_vlan_filtering() after leaving a VLAN-aware bridge.
Use a dedicated "other_dp" iterator variable to avoid this from
happening.
Fixes: d0004a020b ("net: dsa: remove the "dsa_to_port in a loop" antipattern from the core")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The blamed refactoring commit changed a "port" iterator with "other_dp",
but still looked at the slave_dev of the dp outside the loop, instead of
other_dp->slave from the loop.
As a result, dsa_port_vlan_filtering() would not call
dsa_slave_manage_vlan_filtering() except for the port in cause, and not
for all switch ports as expected.
Fixes: d0004a020b ("net: dsa: remove the "dsa_to_port in a loop" antipattern from the core")
Reported-by: Lucian Banu <Lucian.Banu@westermo.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The Microchip LAN937X switches have a tagging protocol which is
very similar to KSZ tagging. So that the implementation is added to
tag_ksz.c and reused common APIs
Signed-off-by: Prasanna Vengateshan <prasanna.vengateshan@microchip.com>
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The switch that is present on the Renesas RZ/N1 SoC uses a specific
VLAN value followed by 6 bytes which contains forwarding configuration.
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to allow dsa drivers to specify the .get_rmon_stats()
operation.
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some drivers might report that they are unable to bridge ports by
returning -EOPNOTSUPP, but still wants to override extack message.
In order to do so, in dsa_slave_changeupper(), if port_bridge_join()
returns -EOPNOTSUPP, check if extack message is set and if so, do not
override it.
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As explained in commit 316580b69d ("u64_stats: provide u64_stats_t type")
we should use u64_stats_t and related accessors to avoid load/store tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If found, register the DSA internally allocated slave_mii_bus with an OF
"mdio" child object. It can save some drivers from creating their
custom internal MDIO bus.
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DSA has not supported (and probably will not support in the future
either) independent tagging protocols per CPU port.
Different switch drivers have different requirements, some may need to
replicate some settings for each CPU port, some may need to apply some
settings on a single CPU port, while some may have to configure some
global settings and then some per-CPU-port settings.
In any case, the current model where DSA calls ->change_tag_protocol for
each CPU port turns out to be impractical for drivers where there are
global things to be done. For example, felix calls dsa_tag_8021q_register(),
which makes no sense per CPU port, so it suppresses the second call.
Let drivers deal with replication towards all CPU ports, and remove the
CPU port argument from the function prototype.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
At the time - commit 7569459a52 ("net: dsa: manage flooding on the CPU
ports") - not introducing a dedicated switch callback for host flooding
made sense, because for the only user, the felix driver, there was
nothing different to do for the CPU port than set the flood flags on the
CPU port just like on any other bridge port.
There are 2 reasons why this approach is not good enough, however.
(1) Other drivers, like sja1105, support configuring flooding as a
function of {ingress port, egress port}, whereas the DSA
->port_bridge_flags() function only operates on an egress port.
So with that driver we'd have useless host flooding from user ports
which don't need it.
(2) Even with the felix driver, support for multiple CPU ports makes it
difficult to piggyback on ->port_bridge_flags(). The way in which
the felix driver is going to support host-filtered addresses with
multiple CPU ports is that it will direct these addresses towards
both CPU ports (in a sort of multicast fashion), then restrict the
forwarding to only one of the two using the forwarding masks.
Consequently, flooding will also be enabled towards both CPU ports.
However, ->port_bridge_flags() gets passed the index of a single CPU
port, and that leaves the flood settings out of sync between the 2
CPU ports.
This is to say, it's better to have a specific driver method for host
flooding, which takes the user port as argument. This solves problem (1)
by allowing the driver to do different things for different user ports,
and problem (2) by abstracting the operation and letting the driver do
whatever, rather than explicitly making the DSA core point to the CPU
port it thinks needs to be touched.
This new method also creates a problem, which is that cross-chip setups
are not handled. However I don't have hardware right now where I can
test what is the proper thing to do, and there isn't hardware compatible
with multi-switch trees that supports host flooding. So it remains a
problem to be tackled in the future.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>