OpenCloudOS-Kernel/drivers/net/dsa/ocelot/felix.c

1934 lines
52 KiB
C
Raw Normal View History

net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
// SPDX-License-Identifier: GPL-2.0
/* Copyright 2019-2021 NXP
*
* This is an umbrella module for all network switches that are
* register-compatible with Ocelot and that perform I/O to their host CPU
* through an NPI (Node Processor Interface) Ethernet port.
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
*/
#include <uapi/linux/if_bridge.h>
#include <soc/mscc/ocelot_vcap.h>
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
#include <soc/mscc/ocelot_qsys.h>
#include <soc/mscc/ocelot_sys.h>
#include <soc/mscc/ocelot_dev.h>
#include <soc/mscc/ocelot_ana.h>
#include <soc/mscc/ocelot_ptp.h>
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
#include <soc/mscc/ocelot.h>
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
#include <linux/dsa/8021q.h>
#include <linux/dsa/ocelot.h>
net: dsa: felix: introduce support for Seville VSC9953 switch This is another switch from Vitesse / Microsemi / Microchip, that has 10 ports (8 external, 2 internal) and is integrated into the Freescale / NXP T1040 PowerPC SoC. It is very similar to Felix from NXP LS1028A, except that this is a platform device and Felix is a PCI device, and it doesn't support IEEE 1588 and TSN. Like Felix, this driver configures its own PCS on the internal MDIO bus using a phy_device abstraction for it (yes, it will be refactored to use a raw mdio_device, like other phylink drivers do, but let's keep it like that for now). But unlike Felix, the MDIO bus and the PCS are not from the same vendor. The PCS is the same QorIQ/Layerscape PCS as found in Felix/ENETC/DPAA*, but the internal MDIO bus that is used to access it is actually an instantiation of drivers/net/phy/mdio-mscc-miim.c. But it would be difficult to reuse that driver (it doesn't even use regmap, and it's less than 200 lines of code), so we hand-roll here some internal MDIO bus accessors within seville_vsc9953.c, which serves the purpose of driving the PCS absolutely fine. Also, same as Felix, the PCS doesn't support dynamic reconfiguration of SerDes protocol, so we need to do pre-validation of PHY mode from device tree and not let phylink change it. Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-14 00:57:10 +08:00
#include <linux/platform_device.h>
net: dsa: tag_ocelot_8021q: add support for PTP timestamping For TX timestamping, we use the felix_txtstamp method which is common with the regular (non-8021q) ocelot tagger. This method says that skb deferral is needed, prepares a timestamp request ID, and puts a clone of the skb in a queue waiting for the timestamp IRQ. felix_txtstamp is called by dsa_skb_tx_timestamp() just before the tagger's xmit method. In the tagger xmit, we divert the packets classified by dsa_skb_tx_timestamp() as PTP towards the MMIO-based injection registers, and we declare them as dead towards dsa_slave_xmit. If not PTP, we proceed with normal tag_8021q stuff. Then the timestamp IRQ fires, the clone queued up from felix_txtstamp is matched to the TX timestamp retrieved from the switch's FIFO based on the timestamp request ID, and the clone is delivered to the stack. On RX, thanks to the VCAP IS2 rule that redirects the frames with an EtherType for 1588 towards two destinations: - the CPU port module (for MMIO based extraction) and - if the "no XTR IRQ" workaround is in place, the dsa_8021q CPU port the relevant data path processing starts in the ptp_classify_raw BPF classifier installed by DSA in the RX data path (post tagger, which is completely unaware that it saw a PTP packet). This time we can't reuse the same implementation of .port_rxtstamp that also works with the default ocelot tagger. That is because felix_rxtstamp is given an skb with a freshly stripped DSA header, and it says "I don't need deferral for its RX timestamp, it's right in it, let me show you"; and it just points to the header right behind skb->data, from where it unpacks the timestamp and annotates the skb with it. The same thing cannot happen with tag_ocelot_8021q, because for one thing, the skb did not have an extraction frame header in the first place, but a VLAN tag with no timestamp information. So the code paths in felix_rxtstamp for the regular and 8021q tagger are completely independent. With tag_8021q, the timestamp must come from the packet's duplicate delivered to the CPU port module, but there is potentially complex logic to be handled [ and prone to reordering ] if we were to just start reading packets from the CPU port module, and try to match them to the one we received over Ethernet and which needs an RX timestamp. So we do something simple: we tell DSA "give me some time to think" (we request skb deferral by returning false from .port_rxtstamp) and we just drop the frame we got over Ethernet with no attempt to match it to anything - we just treat it as a notification that there's data to be processed from the CPU port module's queues. Then we proceed to read the packets from those, one by one, which we deliver up the stack, timestamped, using netif_rx - the same function that any driver would use anyway if it needed RX timestamp deferral. So the assumption is that we'll come across the PTP packet that triggered the CPU extraction notification eventually, but we don't know when exactly. Thanks to the VCAP IS2 trap/redirect rule and the exclusion of the CPU port module from the flooding replicators, only PTP frames should be present in the CPU port module's RX queues anyway. There is just one conflict between the VCAP IS2 trapping rule and the semantics of the BPF classifier. Namely, ptp_classify_raw() deems general messages as non-timestampable, but still, those are trapped to the CPU port module since they have an EtherType of ETH_P_1588. So, if the "no XTR IRQ" workaround is in place, we need to run another BPF classifier on the frames extracted over MMIO, to avoid duplicates being sent to the stack (once over Ethernet, once over MMIO). It doesn't look like it's possible to install VCAP IS2 rules based on keys extracted from the 1588 frame headers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 06:38:01 +08:00
#include <linux/ptp_classify.h>
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
#include <linux/module.h>
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
#include <linux/of_net.h>
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
#include <linux/pci.h>
#include <linux/of.h>
#include <net/pkt_sched.h>
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
#include <net/dsa.h>
#include "felix.h"
/* Translate the DSA database API into the ocelot switch library API,
* which uses VID 0 for all ports that aren't part of a bridge,
* and expects the bridge_dev to be NULL in that case.
*/
static struct net_device *felix_classify_db(struct dsa_db db)
{
switch (db.type) {
case DSA_DB_PORT:
case DSA_DB_LAG:
return NULL;
case DSA_DB_BRIDGE:
return db.bridge.dev;
default:
return ERR_PTR(-EOPNOTSUPP);
}
}
static void felix_migrate_pgid_bit(struct dsa_switch *ds, int from, int to,
int pgid)
{
struct ocelot *ocelot = ds->priv;
bool on;
u32 val;
val = ocelot_read_rix(ocelot, ANA_PGID_PGID, pgid);
on = !!(val & BIT(from));
val &= ~BIT(from);
if (on)
val |= BIT(to);
else
val &= ~BIT(to);
ocelot_write_rix(ocelot, val, ANA_PGID_PGID, pgid);
}
static void felix_migrate_flood_to_npi_port(struct dsa_switch *ds, int port)
{
struct ocelot *ocelot = ds->priv;
felix_migrate_pgid_bit(ds, port, ocelot->num_phys_ports, PGID_UC);
felix_migrate_pgid_bit(ds, port, ocelot->num_phys_ports, PGID_MC);
felix_migrate_pgid_bit(ds, port, ocelot->num_phys_ports, PGID_BC);
}
static void
felix_migrate_flood_to_tag_8021q_port(struct dsa_switch *ds, int port)
{
struct ocelot *ocelot = ds->priv;
felix_migrate_pgid_bit(ds, ocelot->num_phys_ports, port, PGID_UC);
felix_migrate_pgid_bit(ds, ocelot->num_phys_ports, port, PGID_MC);
felix_migrate_pgid_bit(ds, ocelot->num_phys_ports, port, PGID_BC);
}
net: dsa: tag_8021q: merge RX and TX VLANs In the old Shared VLAN Learning mode of operation that tag_8021q previously used for forwarding, we needed to have distinct concepts for an RX and a TX VLAN. An RX VLAN could be installed on all ports that were members of a given bridge, so that autonomous forwarding could still work, while a TX VLAN was dedicated for precise packet steering, so it just contained the CPU port and one egress port. Now that tag_8021q uses Independent VLAN Learning and imprecise RX/TX all over, those lines have been blurred and we no longer have the need to do precise TX towards a port that is in a bridge. As for standalone ports, it is fine to use the same VLAN ID for both RX and TX. This patch changes the tag_8021q format by shifting the VLAN range it reserves, and halving it. Previously, our DIR bits were encoding the VLAN direction (RX/TX) and were set to either 1 or 2. This meant that tag_8021q reserved 2K VLANs, or 50% of the available range. Change the DIR bits to a hardcoded value of 3 now, which makes tag_8021q reserve only 1K VLANs, and a different range now (the last 1K). This is done so that we leave the old format in place in case we need to return to it. In terms of code, the vid_is_dsa_8021q_rxvlan and vid_is_dsa_8021q_txvlan functions go away. Any vid_is_dsa_8021q is both a TX and an RX VLAN, and they are no longer distinct. For example, felix which did different things for different VLAN types, now needs to handle the RX and the TX logic for the same VLAN. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:20 +08:00
/* Set up VCAP ES0 rules for pushing a tag_8021q VLAN towards the CPU such that
* the tagger can perform RX source port identification.
*/
static int felix_tag_8021q_vlan_add_rx(struct felix *felix, int port, u16 vid)
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
{
struct ocelot_vcap_filter *outer_tagging_rule;
struct ocelot *ocelot = &felix->ocelot;
struct dsa_switch *ds = felix->ds;
int key_length, upstream, err;
key_length = ocelot->vcap[VCAP_ES0].keys[VCAP_ES0_IGR_PORT].length;
upstream = dsa_upstream_port(ds, port);
outer_tagging_rule = kzalloc(sizeof(struct ocelot_vcap_filter),
GFP_KERNEL);
if (!outer_tagging_rule)
return -ENOMEM;
outer_tagging_rule->key_type = OCELOT_VCAP_KEY_ANY;
outer_tagging_rule->prio = 1;
outer_tagging_rule->id.cookie = OCELOT_VCAP_ES0_TAG_8021Q_RXVLAN(ocelot, port);
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
outer_tagging_rule->id.tc_offload = false;
outer_tagging_rule->block_id = VCAP_ES0;
outer_tagging_rule->type = OCELOT_VCAP_FILTER_OFFLOAD;
outer_tagging_rule->lookup = 0;
outer_tagging_rule->ingress_port.value = port;
outer_tagging_rule->ingress_port.mask = GENMASK(key_length - 1, 0);
outer_tagging_rule->egress_port.value = upstream;
outer_tagging_rule->egress_port.mask = GENMASK(key_length - 1, 0);
outer_tagging_rule->action.push_outer_tag = OCELOT_ES0_TAG;
outer_tagging_rule->action.tag_a_tpid_sel = OCELOT_TAG_TPID_SEL_8021AD;
outer_tagging_rule->action.tag_a_vid_sel = 1;
outer_tagging_rule->action.vid_a_val = vid;
err = ocelot_vcap_filter_add(ocelot, outer_tagging_rule, NULL);
if (err)
kfree(outer_tagging_rule);
return err;
}
net: dsa: tag_8021q: merge RX and TX VLANs In the old Shared VLAN Learning mode of operation that tag_8021q previously used for forwarding, we needed to have distinct concepts for an RX and a TX VLAN. An RX VLAN could be installed on all ports that were members of a given bridge, so that autonomous forwarding could still work, while a TX VLAN was dedicated for precise packet steering, so it just contained the CPU port and one egress port. Now that tag_8021q uses Independent VLAN Learning and imprecise RX/TX all over, those lines have been blurred and we no longer have the need to do precise TX towards a port that is in a bridge. As for standalone ports, it is fine to use the same VLAN ID for both RX and TX. This patch changes the tag_8021q format by shifting the VLAN range it reserves, and halving it. Previously, our DIR bits were encoding the VLAN direction (RX/TX) and were set to either 1 or 2. This meant that tag_8021q reserved 2K VLANs, or 50% of the available range. Change the DIR bits to a hardcoded value of 3 now, which makes tag_8021q reserve only 1K VLANs, and a different range now (the last 1K). This is done so that we leave the old format in place in case we need to return to it. In terms of code, the vid_is_dsa_8021q_rxvlan and vid_is_dsa_8021q_txvlan functions go away. Any vid_is_dsa_8021q is both a TX and an RX VLAN, and they are no longer distinct. For example, felix which did different things for different VLAN types, now needs to handle the RX and the TX logic for the same VLAN. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:20 +08:00
static int felix_tag_8021q_vlan_del_rx(struct felix *felix, int port, u16 vid)
{
struct ocelot_vcap_filter *outer_tagging_rule;
struct ocelot_vcap_block *block_vcap_es0;
struct ocelot *ocelot = &felix->ocelot;
block_vcap_es0 = &ocelot->block[VCAP_ES0];
outer_tagging_rule = ocelot_vcap_block_find_filter_by_id(block_vcap_es0,
port, false);
if (!outer_tagging_rule)
return -ENOENT;
return ocelot_vcap_filter_del(ocelot, outer_tagging_rule);
}
/* Set up VCAP IS1 rules for stripping the tag_8021q VLAN on TX and VCAP IS2
* rules for steering those tagged packets towards the correct destination port
*/
static int felix_tag_8021q_vlan_add_tx(struct felix *felix, int port, u16 vid)
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
{
struct ocelot_vcap_filter *untagging_rule, *redirect_rule;
struct ocelot *ocelot = &felix->ocelot;
struct dsa_switch *ds = felix->ds;
int upstream, err;
untagging_rule = kzalloc(sizeof(struct ocelot_vcap_filter), GFP_KERNEL);
if (!untagging_rule)
return -ENOMEM;
redirect_rule = kzalloc(sizeof(struct ocelot_vcap_filter), GFP_KERNEL);
if (!redirect_rule) {
kfree(untagging_rule);
return -ENOMEM;
}
upstream = dsa_upstream_port(ds, port);
untagging_rule->key_type = OCELOT_VCAP_KEY_ANY;
untagging_rule->ingress_port_mask = BIT(upstream);
untagging_rule->vlan.vid.value = vid;
untagging_rule->vlan.vid.mask = VLAN_VID_MASK;
untagging_rule->prio = 1;
untagging_rule->id.cookie = OCELOT_VCAP_IS1_TAG_8021Q_TXVLAN(ocelot, port);
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
untagging_rule->id.tc_offload = false;
untagging_rule->block_id = VCAP_IS1;
untagging_rule->type = OCELOT_VCAP_FILTER_OFFLOAD;
untagging_rule->lookup = 0;
untagging_rule->action.vlan_pop_cnt_ena = true;
untagging_rule->action.vlan_pop_cnt = 1;
untagging_rule->action.pag_override_mask = 0xff;
untagging_rule->action.pag_val = port;
err = ocelot_vcap_filter_add(ocelot, untagging_rule, NULL);
if (err) {
kfree(untagging_rule);
kfree(redirect_rule);
return err;
}
redirect_rule->key_type = OCELOT_VCAP_KEY_ANY;
redirect_rule->ingress_port_mask = BIT(upstream);
redirect_rule->pag = port;
redirect_rule->prio = 1;
redirect_rule->id.cookie = OCELOT_VCAP_IS2_TAG_8021Q_TXVLAN(ocelot, port);
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
redirect_rule->id.tc_offload = false;
redirect_rule->block_id = VCAP_IS2;
redirect_rule->type = OCELOT_VCAP_FILTER_OFFLOAD;
redirect_rule->lookup = 0;
redirect_rule->action.mask_mode = OCELOT_MASK_MODE_REDIRECT;
redirect_rule->action.port_mask = BIT(port);
err = ocelot_vcap_filter_add(ocelot, redirect_rule, NULL);
if (err) {
ocelot_vcap_filter_del(ocelot, untagging_rule);
kfree(redirect_rule);
return err;
}
return 0;
}
net: dsa: tag_8021q: merge RX and TX VLANs In the old Shared VLAN Learning mode of operation that tag_8021q previously used for forwarding, we needed to have distinct concepts for an RX and a TX VLAN. An RX VLAN could be installed on all ports that were members of a given bridge, so that autonomous forwarding could still work, while a TX VLAN was dedicated for precise packet steering, so it just contained the CPU port and one egress port. Now that tag_8021q uses Independent VLAN Learning and imprecise RX/TX all over, those lines have been blurred and we no longer have the need to do precise TX towards a port that is in a bridge. As for standalone ports, it is fine to use the same VLAN ID for both RX and TX. This patch changes the tag_8021q format by shifting the VLAN range it reserves, and halving it. Previously, our DIR bits were encoding the VLAN direction (RX/TX) and were set to either 1 or 2. This meant that tag_8021q reserved 2K VLANs, or 50% of the available range. Change the DIR bits to a hardcoded value of 3 now, which makes tag_8021q reserve only 1K VLANs, and a different range now (the last 1K). This is done so that we leave the old format in place in case we need to return to it. In terms of code, the vid_is_dsa_8021q_rxvlan and vid_is_dsa_8021q_txvlan functions go away. Any vid_is_dsa_8021q is both a TX and an RX VLAN, and they are no longer distinct. For example, felix which did different things for different VLAN types, now needs to handle the RX and the TX logic for the same VLAN. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:20 +08:00
static int felix_tag_8021q_vlan_del_tx(struct felix *felix, int port, u16 vid)
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
{
struct ocelot_vcap_filter *untagging_rule, *redirect_rule;
struct ocelot_vcap_block *block_vcap_is1;
struct ocelot_vcap_block *block_vcap_is2;
struct ocelot *ocelot = &felix->ocelot;
int err;
block_vcap_is1 = &ocelot->block[VCAP_IS1];
block_vcap_is2 = &ocelot->block[VCAP_IS2];
untagging_rule = ocelot_vcap_block_find_filter_by_id(block_vcap_is1,
port, false);
if (!untagging_rule)
return -ENOENT;
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
err = ocelot_vcap_filter_del(ocelot, untagging_rule);
if (err)
return err;
redirect_rule = ocelot_vcap_block_find_filter_by_id(block_vcap_is2,
port, false);
if (!redirect_rule)
net: dsa: tag_8021q: merge RX and TX VLANs In the old Shared VLAN Learning mode of operation that tag_8021q previously used for forwarding, we needed to have distinct concepts for an RX and a TX VLAN. An RX VLAN could be installed on all ports that were members of a given bridge, so that autonomous forwarding could still work, while a TX VLAN was dedicated for precise packet steering, so it just contained the CPU port and one egress port. Now that tag_8021q uses Independent VLAN Learning and imprecise RX/TX all over, those lines have been blurred and we no longer have the need to do precise TX towards a port that is in a bridge. As for standalone ports, it is fine to use the same VLAN ID for both RX and TX. This patch changes the tag_8021q format by shifting the VLAN range it reserves, and halving it. Previously, our DIR bits were encoding the VLAN direction (RX/TX) and were set to either 1 or 2. This meant that tag_8021q reserved 2K VLANs, or 50% of the available range. Change the DIR bits to a hardcoded value of 3 now, which makes tag_8021q reserve only 1K VLANs, and a different range now (the last 1K). This is done so that we leave the old format in place in case we need to return to it. In terms of code, the vid_is_dsa_8021q_rxvlan and vid_is_dsa_8021q_txvlan functions go away. Any vid_is_dsa_8021q is both a TX and an RX VLAN, and they are no longer distinct. For example, felix which did different things for different VLAN types, now needs to handle the RX and the TX logic for the same VLAN. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:20 +08:00
return -ENOENT;
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
return ocelot_vcap_filter_del(ocelot, redirect_rule);
}
net: dsa: tag_8021q: merge RX and TX VLANs In the old Shared VLAN Learning mode of operation that tag_8021q previously used for forwarding, we needed to have distinct concepts for an RX and a TX VLAN. An RX VLAN could be installed on all ports that were members of a given bridge, so that autonomous forwarding could still work, while a TX VLAN was dedicated for precise packet steering, so it just contained the CPU port and one egress port. Now that tag_8021q uses Independent VLAN Learning and imprecise RX/TX all over, those lines have been blurred and we no longer have the need to do precise TX towards a port that is in a bridge. As for standalone ports, it is fine to use the same VLAN ID for both RX and TX. This patch changes the tag_8021q format by shifting the VLAN range it reserves, and halving it. Previously, our DIR bits were encoding the VLAN direction (RX/TX) and were set to either 1 or 2. This meant that tag_8021q reserved 2K VLANs, or 50% of the available range. Change the DIR bits to a hardcoded value of 3 now, which makes tag_8021q reserve only 1K VLANs, and a different range now (the last 1K). This is done so that we leave the old format in place in case we need to return to it. In terms of code, the vid_is_dsa_8021q_rxvlan and vid_is_dsa_8021q_txvlan functions go away. Any vid_is_dsa_8021q is both a TX and an RX VLAN, and they are no longer distinct. For example, felix which did different things for different VLAN types, now needs to handle the RX and the TX logic for the same VLAN. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:20 +08:00
static int felix_tag_8021q_vlan_add(struct dsa_switch *ds, int port, u16 vid,
u16 flags)
{
struct ocelot *ocelot = ds->priv;
int err;
/* tag_8021q.c assumes we are implementing this via port VLAN
* membership, which we aren't. So we don't need to add any VCAP filter
* for the CPU port.
*/
if (!dsa_is_user_port(ds, port))
return 0;
err = felix_tag_8021q_vlan_add_rx(ocelot_to_felix(ocelot), port, vid);
if (err)
return err;
err = felix_tag_8021q_vlan_add_tx(ocelot_to_felix(ocelot), port, vid);
if (err) {
felix_tag_8021q_vlan_del_rx(ocelot_to_felix(ocelot), port, vid);
return err;
}
return 0;
}
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
static int felix_tag_8021q_vlan_del(struct dsa_switch *ds, int port, u16 vid)
{
struct ocelot *ocelot = ds->priv;
net: dsa: tag_8021q: merge RX and TX VLANs In the old Shared VLAN Learning mode of operation that tag_8021q previously used for forwarding, we needed to have distinct concepts for an RX and a TX VLAN. An RX VLAN could be installed on all ports that were members of a given bridge, so that autonomous forwarding could still work, while a TX VLAN was dedicated for precise packet steering, so it just contained the CPU port and one egress port. Now that tag_8021q uses Independent VLAN Learning and imprecise RX/TX all over, those lines have been blurred and we no longer have the need to do precise TX towards a port that is in a bridge. As for standalone ports, it is fine to use the same VLAN ID for both RX and TX. This patch changes the tag_8021q format by shifting the VLAN range it reserves, and halving it. Previously, our DIR bits were encoding the VLAN direction (RX/TX) and were set to either 1 or 2. This meant that tag_8021q reserved 2K VLANs, or 50% of the available range. Change the DIR bits to a hardcoded value of 3 now, which makes tag_8021q reserve only 1K VLANs, and a different range now (the last 1K). This is done so that we leave the old format in place in case we need to return to it. In terms of code, the vid_is_dsa_8021q_rxvlan and vid_is_dsa_8021q_txvlan functions go away. Any vid_is_dsa_8021q is both a TX and an RX VLAN, and they are no longer distinct. For example, felix which did different things for different VLAN types, now needs to handle the RX and the TX logic for the same VLAN. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:20 +08:00
int err;
if (!dsa_is_user_port(ds, port))
return 0;
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
net: dsa: tag_8021q: merge RX and TX VLANs In the old Shared VLAN Learning mode of operation that tag_8021q previously used for forwarding, we needed to have distinct concepts for an RX and a TX VLAN. An RX VLAN could be installed on all ports that were members of a given bridge, so that autonomous forwarding could still work, while a TX VLAN was dedicated for precise packet steering, so it just contained the CPU port and one egress port. Now that tag_8021q uses Independent VLAN Learning and imprecise RX/TX all over, those lines have been blurred and we no longer have the need to do precise TX towards a port that is in a bridge. As for standalone ports, it is fine to use the same VLAN ID for both RX and TX. This patch changes the tag_8021q format by shifting the VLAN range it reserves, and halving it. Previously, our DIR bits were encoding the VLAN direction (RX/TX) and were set to either 1 or 2. This meant that tag_8021q reserved 2K VLANs, or 50% of the available range. Change the DIR bits to a hardcoded value of 3 now, which makes tag_8021q reserve only 1K VLANs, and a different range now (the last 1K). This is done so that we leave the old format in place in case we need to return to it. In terms of code, the vid_is_dsa_8021q_rxvlan and vid_is_dsa_8021q_txvlan functions go away. Any vid_is_dsa_8021q is both a TX and an RX VLAN, and they are no longer distinct. For example, felix which did different things for different VLAN types, now needs to handle the RX and the TX logic for the same VLAN. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:20 +08:00
err = felix_tag_8021q_vlan_del_rx(ocelot_to_felix(ocelot), port, vid);
if (err)
return err;
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
net: dsa: tag_8021q: merge RX and TX VLANs In the old Shared VLAN Learning mode of operation that tag_8021q previously used for forwarding, we needed to have distinct concepts for an RX and a TX VLAN. An RX VLAN could be installed on all ports that were members of a given bridge, so that autonomous forwarding could still work, while a TX VLAN was dedicated for precise packet steering, so it just contained the CPU port and one egress port. Now that tag_8021q uses Independent VLAN Learning and imprecise RX/TX all over, those lines have been blurred and we no longer have the need to do precise TX towards a port that is in a bridge. As for standalone ports, it is fine to use the same VLAN ID for both RX and TX. This patch changes the tag_8021q format by shifting the VLAN range it reserves, and halving it. Previously, our DIR bits were encoding the VLAN direction (RX/TX) and were set to either 1 or 2. This meant that tag_8021q reserved 2K VLANs, or 50% of the available range. Change the DIR bits to a hardcoded value of 3 now, which makes tag_8021q reserve only 1K VLANs, and a different range now (the last 1K). This is done so that we leave the old format in place in case we need to return to it. In terms of code, the vid_is_dsa_8021q_rxvlan and vid_is_dsa_8021q_txvlan functions go away. Any vid_is_dsa_8021q is both a TX and an RX VLAN, and they are no longer distinct. For example, felix which did different things for different VLAN types, now needs to handle the RX and the TX logic for the same VLAN. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:20 +08:00
err = felix_tag_8021q_vlan_del_tx(ocelot_to_felix(ocelot), port, vid);
if (err) {
felix_tag_8021q_vlan_add_rx(ocelot_to_felix(ocelot), port, vid);
return err;
}
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
return 0;
}
/* Alternatively to using the NPI functionality, that same hardware MAC
* connected internally to the enetc or fman DSA master can be configured to
* use the software-defined tag_8021q frame format. As far as the hardware is
* concerned, it thinks it is a "dumb switch" - the queues of the CPU port
* module are now disconnected from it, but can still be accessed through
* register-based MMIO.
*/
static void felix_8021q_cpu_port_init(struct ocelot *ocelot, int port)
{
net: dsa: felix: enable cut-through forwarding between ports by default The VSC9959 switch embedded within NXP LS1028A (and that version of Ocelot switches only) supports cut-through forwarding - meaning it can start the process of looking up the destination ports for a packet, and forward towards those ports, before the entire packet has been received (as opposed to the store-and-forward mode). The up side is having lower forwarding latency for large packets. The down side is that frames with FCS errors are forwarded instead of being dropped. However, erroneous frames do not result in incorrect updates of the FDB or incorrect policer updates, since these processes are deferred inside the switch to the end of frame. Since the switch starts the cut-through forwarding process after all packet headers (including IP, if any) have been processed, packets with large headers and small payload do not see the benefit of lower forwarding latency. There are two cases that need special attention. The first is when a packet is multicast (or flooded) to multiple destinations, one of which doesn't have cut-through forwarding enabled. The switch deals with this automatically by disabling cut-through forwarding for the frame towards all destination ports. The second is when a packet is forwarded from a port of lower link speed towards a port of higher link speed. This is not handled by the hardware and needs software intervention. Since we practically need to update the cut-through forwarding domain from paths that aren't serialized by the rtnl_mutex (phylink mac_link_down/mac_link_up ops), this means we need to serialize physical link events with user space updates of bonding/bridging domains. Enabling cut-through forwarding is done per {egress port, traffic class}. I don't see any reason why this would be a configurable option as long as it works without issues, and there doesn't appear to be any user space configuration tool to toggle this on/off, so this patch enables cut-through forwarding on all eligible ports and traffic classes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20211125125808.2383984-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-25 20:58:08 +08:00
mutex_lock(&ocelot->fwd_domain_lock);
net: mscc: ocelot: enforce FDB isolation when VLAN-unaware Currently ocelot uses a pvid of 0 for standalone ports and ports under a VLAN-unaware bridge, and the pvid of the bridge for ports under a VLAN-aware bridge. Standalone ports do not perform learning, but packets received on them are still subject to FDB lookups. So if the MAC DA that a standalone port receives has been also learned on a VLAN-unaware bridge port, ocelot will attempt to forward to that port, even though it can't, so it will drop packets. So there is a desire to avoid that, and isolate the FDBs of different bridges from one another, and from standalone ports. The ocelot switch library has two distinct entry points: the felix DSA driver and the ocelot switchdev driver. We need to code up a minimal bridge_num allocation in the ocelot switchdev driver too, this is copied from DSA with the exception that ocelot does not care about DSA trees, cross-chip bridging etc. So it only looks at its own ports that are already in the same bridge. The ocelot switchdev driver uses the bridge_num it has allocated itself, while the felix driver uses the bridge_num allocated by DSA. They are both stored inside ocelot_port->bridge_num by the common function ocelot_port_bridge_join() which receives the bridge_num passed by value. Once we have a bridge_num, we can only use it to enforce isolation between VLAN-unaware bridges. As far as I can see, ocelot does not have anything like a FID that further makes VLAN 100 from a port be different to VLAN 100 from another port with regard to FDB lookup. So we simply deny multiple VLAN-aware bridges. For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we allocate a VLAN for each bridge_num. This will be used as the pvid of each port that is under that VLAN-unaware bridge, for as long as that bridge is VLAN-unaware. VID 0 remains only for standalone ports. It is okay if all standalone ports use the same VID 0, since they perform no address learning, the FDB will contain no entry in VLAN 0, so the packets will always be flooded to the only possible destination, the CPU port. The CPU port module doesn't need to be member of the VLANs to receive packets, but if we use the DSA tag_8021q protocol, those packets are part of the data plane as far as ocelot is concerned, so there it needs to. Just ensure that the DSA tag_8021q CPU port is a member of all reserved VLANs when it is created, and is removed when it is deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:25 +08:00
ocelot_port_set_dsa_8021q_cpu(ocelot, port);
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
/* Overwrite PGID_CPU with the non-tagging port */
ocelot_write_rix(ocelot, BIT(port), ANA_PGID_PGID, PGID_CPU);
net: dsa: felix: enable cut-through forwarding between ports by default The VSC9959 switch embedded within NXP LS1028A (and that version of Ocelot switches only) supports cut-through forwarding - meaning it can start the process of looking up the destination ports for a packet, and forward towards those ports, before the entire packet has been received (as opposed to the store-and-forward mode). The up side is having lower forwarding latency for large packets. The down side is that frames with FCS errors are forwarded instead of being dropped. However, erroneous frames do not result in incorrect updates of the FDB or incorrect policer updates, since these processes are deferred inside the switch to the end of frame. Since the switch starts the cut-through forwarding process after all packet headers (including IP, if any) have been processed, packets with large headers and small payload do not see the benefit of lower forwarding latency. There are two cases that need special attention. The first is when a packet is multicast (or flooded) to multiple destinations, one of which doesn't have cut-through forwarding enabled. The switch deals with this automatically by disabling cut-through forwarding for the frame towards all destination ports. The second is when a packet is forwarded from a port of lower link speed towards a port of higher link speed. This is not handled by the hardware and needs software intervention. Since we practically need to update the cut-through forwarding domain from paths that aren't serialized by the rtnl_mutex (phylink mac_link_down/mac_link_up ops), this means we need to serialize physical link events with user space updates of bonding/bridging domains. Enabling cut-through forwarding is done per {egress port, traffic class}. I don't see any reason why this would be a configurable option as long as it works without issues, and there doesn't appear to be any user space configuration tool to toggle this on/off, so this patch enables cut-through forwarding on all eligible ports and traffic classes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20211125125808.2383984-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-25 20:58:08 +08:00
ocelot_apply_bridge_fwd_mask(ocelot, true);
mutex_unlock(&ocelot->fwd_domain_lock);
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
}
static void felix_8021q_cpu_port_deinit(struct ocelot *ocelot, int port)
{
net: dsa: felix: enable cut-through forwarding between ports by default The VSC9959 switch embedded within NXP LS1028A (and that version of Ocelot switches only) supports cut-through forwarding - meaning it can start the process of looking up the destination ports for a packet, and forward towards those ports, before the entire packet has been received (as opposed to the store-and-forward mode). The up side is having lower forwarding latency for large packets. The down side is that frames with FCS errors are forwarded instead of being dropped. However, erroneous frames do not result in incorrect updates of the FDB or incorrect policer updates, since these processes are deferred inside the switch to the end of frame. Since the switch starts the cut-through forwarding process after all packet headers (including IP, if any) have been processed, packets with large headers and small payload do not see the benefit of lower forwarding latency. There are two cases that need special attention. The first is when a packet is multicast (or flooded) to multiple destinations, one of which doesn't have cut-through forwarding enabled. The switch deals with this automatically by disabling cut-through forwarding for the frame towards all destination ports. The second is when a packet is forwarded from a port of lower link speed towards a port of higher link speed. This is not handled by the hardware and needs software intervention. Since we practically need to update the cut-through forwarding domain from paths that aren't serialized by the rtnl_mutex (phylink mac_link_down/mac_link_up ops), this means we need to serialize physical link events with user space updates of bonding/bridging domains. Enabling cut-through forwarding is done per {egress port, traffic class}. I don't see any reason why this would be a configurable option as long as it works without issues, and there doesn't appear to be any user space configuration tool to toggle this on/off, so this patch enables cut-through forwarding on all eligible ports and traffic classes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20211125125808.2383984-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-25 20:58:08 +08:00
mutex_lock(&ocelot->fwd_domain_lock);
net: mscc: ocelot: enforce FDB isolation when VLAN-unaware Currently ocelot uses a pvid of 0 for standalone ports and ports under a VLAN-unaware bridge, and the pvid of the bridge for ports under a VLAN-aware bridge. Standalone ports do not perform learning, but packets received on them are still subject to FDB lookups. So if the MAC DA that a standalone port receives has been also learned on a VLAN-unaware bridge port, ocelot will attempt to forward to that port, even though it can't, so it will drop packets. So there is a desire to avoid that, and isolate the FDBs of different bridges from one another, and from standalone ports. The ocelot switch library has two distinct entry points: the felix DSA driver and the ocelot switchdev driver. We need to code up a minimal bridge_num allocation in the ocelot switchdev driver too, this is copied from DSA with the exception that ocelot does not care about DSA trees, cross-chip bridging etc. So it only looks at its own ports that are already in the same bridge. The ocelot switchdev driver uses the bridge_num it has allocated itself, while the felix driver uses the bridge_num allocated by DSA. They are both stored inside ocelot_port->bridge_num by the common function ocelot_port_bridge_join() which receives the bridge_num passed by value. Once we have a bridge_num, we can only use it to enforce isolation between VLAN-unaware bridges. As far as I can see, ocelot does not have anything like a FID that further makes VLAN 100 from a port be different to VLAN 100 from another port with regard to FDB lookup. So we simply deny multiple VLAN-aware bridges. For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we allocate a VLAN for each bridge_num. This will be used as the pvid of each port that is under that VLAN-unaware bridge, for as long as that bridge is VLAN-unaware. VID 0 remains only for standalone ports. It is okay if all standalone ports use the same VID 0, since they perform no address learning, the FDB will contain no entry in VLAN 0, so the packets will always be flooded to the only possible destination, the CPU port. The CPU port module doesn't need to be member of the VLANs to receive packets, but if we use the DSA tag_8021q protocol, those packets are part of the data plane as far as ocelot is concerned, so there it needs to. Just ensure that the DSA tag_8021q CPU port is a member of all reserved VLANs when it is created, and is removed when it is deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:25 +08:00
ocelot_port_unset_dsa_8021q_cpu(ocelot, port);
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
/* Restore PGID_CPU */
ocelot_write_rix(ocelot, BIT(ocelot->num_phys_ports), ANA_PGID_PGID,
PGID_CPU);
net: dsa: felix: enable cut-through forwarding between ports by default The VSC9959 switch embedded within NXP LS1028A (and that version of Ocelot switches only) supports cut-through forwarding - meaning it can start the process of looking up the destination ports for a packet, and forward towards those ports, before the entire packet has been received (as opposed to the store-and-forward mode). The up side is having lower forwarding latency for large packets. The down side is that frames with FCS errors are forwarded instead of being dropped. However, erroneous frames do not result in incorrect updates of the FDB or incorrect policer updates, since these processes are deferred inside the switch to the end of frame. Since the switch starts the cut-through forwarding process after all packet headers (including IP, if any) have been processed, packets with large headers and small payload do not see the benefit of lower forwarding latency. There are two cases that need special attention. The first is when a packet is multicast (or flooded) to multiple destinations, one of which doesn't have cut-through forwarding enabled. The switch deals with this automatically by disabling cut-through forwarding for the frame towards all destination ports. The second is when a packet is forwarded from a port of lower link speed towards a port of higher link speed. This is not handled by the hardware and needs software intervention. Since we practically need to update the cut-through forwarding domain from paths that aren't serialized by the rtnl_mutex (phylink mac_link_down/mac_link_up ops), this means we need to serialize physical link events with user space updates of bonding/bridging domains. Enabling cut-through forwarding is done per {egress port, traffic class}. I don't see any reason why this would be a configurable option as long as it works without issues, and there doesn't appear to be any user space configuration tool to toggle this on/off, so this patch enables cut-through forwarding on all eligible ports and traffic classes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20211125125808.2383984-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-25 20:58:08 +08:00
ocelot_apply_bridge_fwd_mask(ocelot, true);
mutex_unlock(&ocelot->fwd_domain_lock);
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
}
net: dsa: felix: update destinations of existing traps with ocelot-8021q Historically, the felix DSA driver has installed special traps such that PTP over L2 works with the ocelot-8021q tagging protocol; commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") has the details. Then the ocelot switch library also gained more comprehensive support for PTP traps through commit 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets"). Right now, PTP over L2 works using ocelot-8021q via the traps it has set for itself, but nothing else does. Consolidating the two code blocks would make ocelot-8021q gain support for PTP over L4 and tc-flower traps, and at the same time avoid some code and TCAM duplication. The traps are similar in intent, but different in execution, so some explanation is required. The traps set up by felix_setup_mmio_filtering() are VCAP IS1 filters, which have a PAG that chains them to a VCAP IS2 filter, and the IS2 is where the 'trap' action resides. The traps set up by ocelot_trap_add(), on the other hand, have a single filter, in VCAP IS2. The reason for chaining VCAP IS1 and IS2 in Felix was to ensure that the hardcoded traps take precedence and cannot be overridden by the Ocelot switch library. So in principle, the PTP traps needed for ocelot-8021q in the Felix driver can rely on ocelot_trap_add(), but the filters need to be patched to account for a quirk that LS1028A has: the quirk_no_xtr_irq described in commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping"). Live-patching is done by iterating through the trap list every time we know it has been updated, and transforming a trap into a redirect + CPU copy if ocelot-8021q is in use. Making the DSA ocelot-8021q tagger work with the Ocelot traps means we can eliminate the dedicated OCELOT_VCAP_IS1_TAG_8021Q_PTP_MMIO and OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO cookies. To minimize the patch delta, OCELOT_VCAP_IS2_MRP_TRAP takes the place of OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO (the alternative would have been to left-shift all cookie numbers by 1). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16 22:30:13 +08:00
/* On switches with no extraction IRQ wired, trapped packets need to be
* replicated over Ethernet as well, otherwise we'd get no notification of
* their arrival when using the ocelot-8021q tagging protocol.
net: dsa: felix: setup MMIO filtering rules for PTP when using tag_8021q Since the tag_8021q tagger is software-defined, it has no means by itself for retrieving hardware timestamps of PTP event messages. Because we do want to support PTP on ocelot even with tag_8021q, we need to use the CPU port module for that. The RX timestamp is present in the Extraction Frame Header. And because we can't use NPI mode which redirects the CPU queues to an "external CPU" (meaning the ARM CPU running Linux), then we need to poll the CPU port module through the MMIO registers to retrieve TX and RX timestamps. Sadly, on NXP LS1028A, the Felix switch was integrated into the SoC without wiring the extraction IRQ line to the ARM GIC. So, if we want to be notified of any PTP packets received on the CPU port module, we have a problem. There is a possible workaround, which is to use the Ethernet CPU port as a notification channel that packets are available on the CPU port module as well. When a PTP packet is received by the DSA tagger (without timestamp, of course), we go to the CPU extraction queues, poll for it there, then we drop the original Ethernet packet and masquerade the packet retrieved over MMIO (plus the timestamp) as the original when we inject it up the stack. Create a quirk in struct felix is selected by the Felix driver (but not by Seville, since that doesn't support PTP at all). We want to do this such that the workaround is minimally invasive for future switches that don't require this workaround. The only traffic for which we need timestamps is PTP traffic, so add a redirection rule to the CPU port module for this. Currently we only have the need for PTP over L2, so redirection rules for UDP ports 319 and 320 are TBD for now. Note that for the workaround of matching of PTP-over-Ethernet-port with PTP-over-MMIO queues to work properly, both channels need to be absolutely lossless. There are two parts to achieving that: - We keep flow control enabled on the tag_8021q CPU port - We put the DSA master interface in promiscuous mode, so it will never drop a PTP frame (for the profiles we are interested in, these are sent to the multicast MAC addresses of 01-80-c2-00-00-0e and 01-1b-19-00-00-00). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 06:38:00 +08:00
*/
net: dsa: felix: update destinations of existing traps with ocelot-8021q Historically, the felix DSA driver has installed special traps such that PTP over L2 works with the ocelot-8021q tagging protocol; commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") has the details. Then the ocelot switch library also gained more comprehensive support for PTP traps through commit 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets"). Right now, PTP over L2 works using ocelot-8021q via the traps it has set for itself, but nothing else does. Consolidating the two code blocks would make ocelot-8021q gain support for PTP over L4 and tc-flower traps, and at the same time avoid some code and TCAM duplication. The traps are similar in intent, but different in execution, so some explanation is required. The traps set up by felix_setup_mmio_filtering() are VCAP IS1 filters, which have a PAG that chains them to a VCAP IS2 filter, and the IS2 is where the 'trap' action resides. The traps set up by ocelot_trap_add(), on the other hand, have a single filter, in VCAP IS2. The reason for chaining VCAP IS1 and IS2 in Felix was to ensure that the hardcoded traps take precedence and cannot be overridden by the Ocelot switch library. So in principle, the PTP traps needed for ocelot-8021q in the Felix driver can rely on ocelot_trap_add(), but the filters need to be patched to account for a quirk that LS1028A has: the quirk_no_xtr_irq described in commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping"). Live-patching is done by iterating through the trap list every time we know it has been updated, and transforming a trap into a redirect + CPU copy if ocelot-8021q is in use. Making the DSA ocelot-8021q tagger work with the Ocelot traps means we can eliminate the dedicated OCELOT_VCAP_IS1_TAG_8021Q_PTP_MMIO and OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO cookies. To minimize the patch delta, OCELOT_VCAP_IS2_MRP_TRAP takes the place of OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO (the alternative would have been to left-shift all cookie numbers by 1). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16 22:30:13 +08:00
static int felix_update_trapping_destinations(struct dsa_switch *ds,
bool using_tag_8021q)
net: dsa: felix: setup MMIO filtering rules for PTP when using tag_8021q Since the tag_8021q tagger is software-defined, it has no means by itself for retrieving hardware timestamps of PTP event messages. Because we do want to support PTP on ocelot even with tag_8021q, we need to use the CPU port module for that. The RX timestamp is present in the Extraction Frame Header. And because we can't use NPI mode which redirects the CPU queues to an "external CPU" (meaning the ARM CPU running Linux), then we need to poll the CPU port module through the MMIO registers to retrieve TX and RX timestamps. Sadly, on NXP LS1028A, the Felix switch was integrated into the SoC without wiring the extraction IRQ line to the ARM GIC. So, if we want to be notified of any PTP packets received on the CPU port module, we have a problem. There is a possible workaround, which is to use the Ethernet CPU port as a notification channel that packets are available on the CPU port module as well. When a PTP packet is received by the DSA tagger (without timestamp, of course), we go to the CPU extraction queues, poll for it there, then we drop the original Ethernet packet and masquerade the packet retrieved over MMIO (plus the timestamp) as the original when we inject it up the stack. Create a quirk in struct felix is selected by the Felix driver (but not by Seville, since that doesn't support PTP at all). We want to do this such that the workaround is minimally invasive for future switches that don't require this workaround. The only traffic for which we need timestamps is PTP traffic, so add a redirection rule to the CPU port module for this. Currently we only have the need for PTP over L2, so redirection rules for UDP ports 319 and 320 are TBD for now. Note that for the workaround of matching of PTP-over-Ethernet-port with PTP-over-MMIO queues to work properly, both channels need to be absolutely lossless. There are two parts to achieving that: - We keep flow control enabled on the tag_8021q CPU port - We put the DSA master interface in promiscuous mode, so it will never drop a PTP frame (for the profiles we are interested in, these are sent to the multicast MAC addresses of 01-80-c2-00-00-0e and 01-1b-19-00-00-00). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 06:38:00 +08:00
{
net: dsa: felix: update destinations of existing traps with ocelot-8021q Historically, the felix DSA driver has installed special traps such that PTP over L2 works with the ocelot-8021q tagging protocol; commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") has the details. Then the ocelot switch library also gained more comprehensive support for PTP traps through commit 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets"). Right now, PTP over L2 works using ocelot-8021q via the traps it has set for itself, but nothing else does. Consolidating the two code blocks would make ocelot-8021q gain support for PTP over L4 and tc-flower traps, and at the same time avoid some code and TCAM duplication. The traps are similar in intent, but different in execution, so some explanation is required. The traps set up by felix_setup_mmio_filtering() are VCAP IS1 filters, which have a PAG that chains them to a VCAP IS2 filter, and the IS2 is where the 'trap' action resides. The traps set up by ocelot_trap_add(), on the other hand, have a single filter, in VCAP IS2. The reason for chaining VCAP IS1 and IS2 in Felix was to ensure that the hardcoded traps take precedence and cannot be overridden by the Ocelot switch library. So in principle, the PTP traps needed for ocelot-8021q in the Felix driver can rely on ocelot_trap_add(), but the filters need to be patched to account for a quirk that LS1028A has: the quirk_no_xtr_irq described in commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping"). Live-patching is done by iterating through the trap list every time we know it has been updated, and transforming a trap into a redirect + CPU copy if ocelot-8021q is in use. Making the DSA ocelot-8021q tagger work with the Ocelot traps means we can eliminate the dedicated OCELOT_VCAP_IS1_TAG_8021Q_PTP_MMIO and OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO cookies. To minimize the patch delta, OCELOT_VCAP_IS2_MRP_TRAP takes the place of OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO (the alternative would have been to left-shift all cookie numbers by 1). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16 22:30:13 +08:00
struct ocelot *ocelot = ds->priv;
struct felix *felix = ocelot_to_felix(ocelot);
net: mscc: ocelot: mark traps with a bool instead of keeping them in a list Since the blamed commit, VCAP filters can appear on more than one list. If their action is "trap", they are chained on ocelot->traps via filter->trap_list. This is in addition to their normal placement on the VCAP block->rules list head. Therefore, when we free a VCAP filter, we must remove it from all lists it is a member of, including ocelot->traps. There are at least 2 bugs which are direct consequences of this design decision. First is the incorrect usage of list_empty(), meant to denote whether "filter" is chained into ocelot->traps via filter->trap_list. This does not do the correct thing, because list_empty() checks whether "head->next == head", but in our case, head->next == head->prev == NULL. So we dereference NULL pointers and die when we call list_del(). Second is the fact that not all places that should remove the filter from ocelot->traps do so. One example is ocelot_vcap_block_remove_filter(), which is where we have the main kfree(filter). By keeping freed filters in ocelot->traps we end up in a use-after-free in felix_update_trapping_destinations(). Attempting to fix all the buggy patterns is a whack-a-mole game which makes the driver unmaintainable. Actually this is what the previous patch version attempted to do: https://patchwork.kernel.org/project/netdevbpf/patch/20220503115728.834457-3-vladimir.oltean@nxp.com/ but it introduced another set of bugs, because there are other places in which create VCAP filters, not just ocelot_vcap_filter_create(): - ocelot_trap_add() - felix_tag_8021q_vlan_add_rx() - felix_tag_8021q_vlan_add_tx() Relying on the convention that all those code paths must call INIT_LIST_HEAD(&filter->trap_list) is not going to scale. So let's do what should have been done in the first place and keep a bool in struct ocelot_vcap_filter which denotes whether we are looking at a trapping rule or not. Iterating now happens over the main VCAP IS2 block->rules. The advantage is that we no longer risk having stale references to a freed filter, since it is only present in that list. Fixes: e42bd4ed09aa ("net: mscc: ocelot: keep traps in a list") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-05 07:54:59 +08:00
struct ocelot_vcap_block *block_vcap_is2;
net: dsa: felix: update destinations of existing traps with ocelot-8021q Historically, the felix DSA driver has installed special traps such that PTP over L2 works with the ocelot-8021q tagging protocol; commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") has the details. Then the ocelot switch library also gained more comprehensive support for PTP traps through commit 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets"). Right now, PTP over L2 works using ocelot-8021q via the traps it has set for itself, but nothing else does. Consolidating the two code blocks would make ocelot-8021q gain support for PTP over L4 and tc-flower traps, and at the same time avoid some code and TCAM duplication. The traps are similar in intent, but different in execution, so some explanation is required. The traps set up by felix_setup_mmio_filtering() are VCAP IS1 filters, which have a PAG that chains them to a VCAP IS2 filter, and the IS2 is where the 'trap' action resides. The traps set up by ocelot_trap_add(), on the other hand, have a single filter, in VCAP IS2. The reason for chaining VCAP IS1 and IS2 in Felix was to ensure that the hardcoded traps take precedence and cannot be overridden by the Ocelot switch library. So in principle, the PTP traps needed for ocelot-8021q in the Felix driver can rely on ocelot_trap_add(), but the filters need to be patched to account for a quirk that LS1028A has: the quirk_no_xtr_irq described in commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping"). Live-patching is done by iterating through the trap list every time we know it has been updated, and transforming a trap into a redirect + CPU copy if ocelot-8021q is in use. Making the DSA ocelot-8021q tagger work with the Ocelot traps means we can eliminate the dedicated OCELOT_VCAP_IS1_TAG_8021Q_PTP_MMIO and OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO cookies. To minimize the patch delta, OCELOT_VCAP_IS2_MRP_TRAP takes the place of OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO (the alternative would have been to left-shift all cookie numbers by 1). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16 22:30:13 +08:00
struct ocelot_vcap_filter *trap;
enum ocelot_mask_mode mask_mode;
unsigned long port_mask;
struct dsa_port *dp;
net: dsa: felix: update destinations of existing traps with ocelot-8021q Historically, the felix DSA driver has installed special traps such that PTP over L2 works with the ocelot-8021q tagging protocol; commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") has the details. Then the ocelot switch library also gained more comprehensive support for PTP traps through commit 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets"). Right now, PTP over L2 works using ocelot-8021q via the traps it has set for itself, but nothing else does. Consolidating the two code blocks would make ocelot-8021q gain support for PTP over L4 and tc-flower traps, and at the same time avoid some code and TCAM duplication. The traps are similar in intent, but different in execution, so some explanation is required. The traps set up by felix_setup_mmio_filtering() are VCAP IS1 filters, which have a PAG that chains them to a VCAP IS2 filter, and the IS2 is where the 'trap' action resides. The traps set up by ocelot_trap_add(), on the other hand, have a single filter, in VCAP IS2. The reason for chaining VCAP IS1 and IS2 in Felix was to ensure that the hardcoded traps take precedence and cannot be overridden by the Ocelot switch library. So in principle, the PTP traps needed for ocelot-8021q in the Felix driver can rely on ocelot_trap_add(), but the filters need to be patched to account for a quirk that LS1028A has: the quirk_no_xtr_irq described in commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping"). Live-patching is done by iterating through the trap list every time we know it has been updated, and transforming a trap into a redirect + CPU copy if ocelot-8021q is in use. Making the DSA ocelot-8021q tagger work with the Ocelot traps means we can eliminate the dedicated OCELOT_VCAP_IS1_TAG_8021Q_PTP_MMIO and OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO cookies. To minimize the patch delta, OCELOT_VCAP_IS2_MRP_TRAP takes the place of OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO (the alternative would have been to left-shift all cookie numbers by 1). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16 22:30:13 +08:00
bool cpu_copy_ena;
int cpu = -1, err;
net: dsa: felix: setup MMIO filtering rules for PTP when using tag_8021q Since the tag_8021q tagger is software-defined, it has no means by itself for retrieving hardware timestamps of PTP event messages. Because we do want to support PTP on ocelot even with tag_8021q, we need to use the CPU port module for that. The RX timestamp is present in the Extraction Frame Header. And because we can't use NPI mode which redirects the CPU queues to an "external CPU" (meaning the ARM CPU running Linux), then we need to poll the CPU port module through the MMIO registers to retrieve TX and RX timestamps. Sadly, on NXP LS1028A, the Felix switch was integrated into the SoC without wiring the extraction IRQ line to the ARM GIC. So, if we want to be notified of any PTP packets received on the CPU port module, we have a problem. There is a possible workaround, which is to use the Ethernet CPU port as a notification channel that packets are available on the CPU port module as well. When a PTP packet is received by the DSA tagger (without timestamp, of course), we go to the CPU extraction queues, poll for it there, then we drop the original Ethernet packet and masquerade the packet retrieved over MMIO (plus the timestamp) as the original when we inject it up the stack. Create a quirk in struct felix is selected by the Felix driver (but not by Seville, since that doesn't support PTP at all). We want to do this such that the workaround is minimally invasive for future switches that don't require this workaround. The only traffic for which we need timestamps is PTP traffic, so add a redirection rule to the CPU port module for this. Currently we only have the need for PTP over L2, so redirection rules for UDP ports 319 and 320 are TBD for now. Note that for the workaround of matching of PTP-over-Ethernet-port with PTP-over-MMIO queues to work properly, both channels need to be absolutely lossless. There are two parts to achieving that: - We keep flow control enabled on the tag_8021q CPU port - We put the DSA master interface in promiscuous mode, so it will never drop a PTP frame (for the profiles we are interested in, these are sent to the multicast MAC addresses of 01-80-c2-00-00-0e and 01-1b-19-00-00-00). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 06:38:00 +08:00
net: dsa: felix: update destinations of existing traps with ocelot-8021q Historically, the felix DSA driver has installed special traps such that PTP over L2 works with the ocelot-8021q tagging protocol; commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") has the details. Then the ocelot switch library also gained more comprehensive support for PTP traps through commit 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets"). Right now, PTP over L2 works using ocelot-8021q via the traps it has set for itself, but nothing else does. Consolidating the two code blocks would make ocelot-8021q gain support for PTP over L4 and tc-flower traps, and at the same time avoid some code and TCAM duplication. The traps are similar in intent, but different in execution, so some explanation is required. The traps set up by felix_setup_mmio_filtering() are VCAP IS1 filters, which have a PAG that chains them to a VCAP IS2 filter, and the IS2 is where the 'trap' action resides. The traps set up by ocelot_trap_add(), on the other hand, have a single filter, in VCAP IS2. The reason for chaining VCAP IS1 and IS2 in Felix was to ensure that the hardcoded traps take precedence and cannot be overridden by the Ocelot switch library. So in principle, the PTP traps needed for ocelot-8021q in the Felix driver can rely on ocelot_trap_add(), but the filters need to be patched to account for a quirk that LS1028A has: the quirk_no_xtr_irq described in commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping"). Live-patching is done by iterating through the trap list every time we know it has been updated, and transforming a trap into a redirect + CPU copy if ocelot-8021q is in use. Making the DSA ocelot-8021q tagger work with the Ocelot traps means we can eliminate the dedicated OCELOT_VCAP_IS1_TAG_8021Q_PTP_MMIO and OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO cookies. To minimize the patch delta, OCELOT_VCAP_IS2_MRP_TRAP takes the place of OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO (the alternative would have been to left-shift all cookie numbers by 1). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16 22:30:13 +08:00
if (!felix->info->quirk_no_xtr_irq)
return 0;
net: dsa: felix: setup MMIO filtering rules for PTP when using tag_8021q Since the tag_8021q tagger is software-defined, it has no means by itself for retrieving hardware timestamps of PTP event messages. Because we do want to support PTP on ocelot even with tag_8021q, we need to use the CPU port module for that. The RX timestamp is present in the Extraction Frame Header. And because we can't use NPI mode which redirects the CPU queues to an "external CPU" (meaning the ARM CPU running Linux), then we need to poll the CPU port module through the MMIO registers to retrieve TX and RX timestamps. Sadly, on NXP LS1028A, the Felix switch was integrated into the SoC without wiring the extraction IRQ line to the ARM GIC. So, if we want to be notified of any PTP packets received on the CPU port module, we have a problem. There is a possible workaround, which is to use the Ethernet CPU port as a notification channel that packets are available on the CPU port module as well. When a PTP packet is received by the DSA tagger (without timestamp, of course), we go to the CPU extraction queues, poll for it there, then we drop the original Ethernet packet and masquerade the packet retrieved over MMIO (plus the timestamp) as the original when we inject it up the stack. Create a quirk in struct felix is selected by the Felix driver (but not by Seville, since that doesn't support PTP at all). We want to do this such that the workaround is minimally invasive for future switches that don't require this workaround. The only traffic for which we need timestamps is PTP traffic, so add a redirection rule to the CPU port module for this. Currently we only have the need for PTP over L2, so redirection rules for UDP ports 319 and 320 are TBD for now. Note that for the workaround of matching of PTP-over-Ethernet-port with PTP-over-MMIO queues to work properly, both channels need to be absolutely lossless. There are two parts to achieving that: - We keep flow control enabled on the tag_8021q CPU port - We put the DSA master interface in promiscuous mode, so it will never drop a PTP frame (for the profiles we are interested in, these are sent to the multicast MAC addresses of 01-80-c2-00-00-0e and 01-1b-19-00-00-00). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 06:38:00 +08:00
net: dsa: felix: update destinations of existing traps with ocelot-8021q Historically, the felix DSA driver has installed special traps such that PTP over L2 works with the ocelot-8021q tagging protocol; commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") has the details. Then the ocelot switch library also gained more comprehensive support for PTP traps through commit 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets"). Right now, PTP over L2 works using ocelot-8021q via the traps it has set for itself, but nothing else does. Consolidating the two code blocks would make ocelot-8021q gain support for PTP over L4 and tc-flower traps, and at the same time avoid some code and TCAM duplication. The traps are similar in intent, but different in execution, so some explanation is required. The traps set up by felix_setup_mmio_filtering() are VCAP IS1 filters, which have a PAG that chains them to a VCAP IS2 filter, and the IS2 is where the 'trap' action resides. The traps set up by ocelot_trap_add(), on the other hand, have a single filter, in VCAP IS2. The reason for chaining VCAP IS1 and IS2 in Felix was to ensure that the hardcoded traps take precedence and cannot be overridden by the Ocelot switch library. So in principle, the PTP traps needed for ocelot-8021q in the Felix driver can rely on ocelot_trap_add(), but the filters need to be patched to account for a quirk that LS1028A has: the quirk_no_xtr_irq described in commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping"). Live-patching is done by iterating through the trap list every time we know it has been updated, and transforming a trap into a redirect + CPU copy if ocelot-8021q is in use. Making the DSA ocelot-8021q tagger work with the Ocelot traps means we can eliminate the dedicated OCELOT_VCAP_IS1_TAG_8021Q_PTP_MMIO and OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO cookies. To minimize the patch delta, OCELOT_VCAP_IS2_MRP_TRAP takes the place of OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO (the alternative would have been to left-shift all cookie numbers by 1). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16 22:30:13 +08:00
/* Figure out the current CPU port */
dsa_switch_for_each_cpu_port(dp, ds) {
cpu = dp->index;
break;
net: dsa: felix: setup MMIO filtering rules for PTP when using tag_8021q Since the tag_8021q tagger is software-defined, it has no means by itself for retrieving hardware timestamps of PTP event messages. Because we do want to support PTP on ocelot even with tag_8021q, we need to use the CPU port module for that. The RX timestamp is present in the Extraction Frame Header. And because we can't use NPI mode which redirects the CPU queues to an "external CPU" (meaning the ARM CPU running Linux), then we need to poll the CPU port module through the MMIO registers to retrieve TX and RX timestamps. Sadly, on NXP LS1028A, the Felix switch was integrated into the SoC without wiring the extraction IRQ line to the ARM GIC. So, if we want to be notified of any PTP packets received on the CPU port module, we have a problem. There is a possible workaround, which is to use the Ethernet CPU port as a notification channel that packets are available on the CPU port module as well. When a PTP packet is received by the DSA tagger (without timestamp, of course), we go to the CPU extraction queues, poll for it there, then we drop the original Ethernet packet and masquerade the packet retrieved over MMIO (plus the timestamp) as the original when we inject it up the stack. Create a quirk in struct felix is selected by the Felix driver (but not by Seville, since that doesn't support PTP at all). We want to do this such that the workaround is minimally invasive for future switches that don't require this workaround. The only traffic for which we need timestamps is PTP traffic, so add a redirection rule to the CPU port module for this. Currently we only have the need for PTP over L2, so redirection rules for UDP ports 319 and 320 are TBD for now. Note that for the workaround of matching of PTP-over-Ethernet-port with PTP-over-MMIO queues to work properly, both channels need to be absolutely lossless. There are two parts to achieving that: - We keep flow control enabled on the tag_8021q CPU port - We put the DSA master interface in promiscuous mode, so it will never drop a PTP frame (for the profiles we are interested in, these are sent to the multicast MAC addresses of 01-80-c2-00-00-0e and 01-1b-19-00-00-00). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 06:38:00 +08:00
}
/* We are sure that "cpu" was found, otherwise
* dsa_tree_setup_default_cpu() would have failed earlier.
*/
net: mscc: ocelot: mark traps with a bool instead of keeping them in a list Since the blamed commit, VCAP filters can appear on more than one list. If their action is "trap", they are chained on ocelot->traps via filter->trap_list. This is in addition to their normal placement on the VCAP block->rules list head. Therefore, when we free a VCAP filter, we must remove it from all lists it is a member of, including ocelot->traps. There are at least 2 bugs which are direct consequences of this design decision. First is the incorrect usage of list_empty(), meant to denote whether "filter" is chained into ocelot->traps via filter->trap_list. This does not do the correct thing, because list_empty() checks whether "head->next == head", but in our case, head->next == head->prev == NULL. So we dereference NULL pointers and die when we call list_del(). Second is the fact that not all places that should remove the filter from ocelot->traps do so. One example is ocelot_vcap_block_remove_filter(), which is where we have the main kfree(filter). By keeping freed filters in ocelot->traps we end up in a use-after-free in felix_update_trapping_destinations(). Attempting to fix all the buggy patterns is a whack-a-mole game which makes the driver unmaintainable. Actually this is what the previous patch version attempted to do: https://patchwork.kernel.org/project/netdevbpf/patch/20220503115728.834457-3-vladimir.oltean@nxp.com/ but it introduced another set of bugs, because there are other places in which create VCAP filters, not just ocelot_vcap_filter_create(): - ocelot_trap_add() - felix_tag_8021q_vlan_add_rx() - felix_tag_8021q_vlan_add_tx() Relying on the convention that all those code paths must call INIT_LIST_HEAD(&filter->trap_list) is not going to scale. So let's do what should have been done in the first place and keep a bool in struct ocelot_vcap_filter which denotes whether we are looking at a trapping rule or not. Iterating now happens over the main VCAP IS2 block->rules. The advantage is that we no longer risk having stale references to a freed filter, since it is only present in that list. Fixes: e42bd4ed09aa ("net: mscc: ocelot: keep traps in a list") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-05 07:54:59 +08:00
block_vcap_is2 = &ocelot->block[VCAP_IS2];
net: dsa: felix: update destinations of existing traps with ocelot-8021q Historically, the felix DSA driver has installed special traps such that PTP over L2 works with the ocelot-8021q tagging protocol; commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") has the details. Then the ocelot switch library also gained more comprehensive support for PTP traps through commit 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets"). Right now, PTP over L2 works using ocelot-8021q via the traps it has set for itself, but nothing else does. Consolidating the two code blocks would make ocelot-8021q gain support for PTP over L4 and tc-flower traps, and at the same time avoid some code and TCAM duplication. The traps are similar in intent, but different in execution, so some explanation is required. The traps set up by felix_setup_mmio_filtering() are VCAP IS1 filters, which have a PAG that chains them to a VCAP IS2 filter, and the IS2 is where the 'trap' action resides. The traps set up by ocelot_trap_add(), on the other hand, have a single filter, in VCAP IS2. The reason for chaining VCAP IS1 and IS2 in Felix was to ensure that the hardcoded traps take precedence and cannot be overridden by the Ocelot switch library. So in principle, the PTP traps needed for ocelot-8021q in the Felix driver can rely on ocelot_trap_add(), but the filters need to be patched to account for a quirk that LS1028A has: the quirk_no_xtr_irq described in commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping"). Live-patching is done by iterating through the trap list every time we know it has been updated, and transforming a trap into a redirect + CPU copy if ocelot-8021q is in use. Making the DSA ocelot-8021q tagger work with the Ocelot traps means we can eliminate the dedicated OCELOT_VCAP_IS1_TAG_8021Q_PTP_MMIO and OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO cookies. To minimize the patch delta, OCELOT_VCAP_IS2_MRP_TRAP takes the place of OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO (the alternative would have been to left-shift all cookie numbers by 1). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16 22:30:13 +08:00
/* Make sure all traps are set up for that destination */
net: mscc: ocelot: mark traps with a bool instead of keeping them in a list Since the blamed commit, VCAP filters can appear on more than one list. If their action is "trap", they are chained on ocelot->traps via filter->trap_list. This is in addition to their normal placement on the VCAP block->rules list head. Therefore, when we free a VCAP filter, we must remove it from all lists it is a member of, including ocelot->traps. There are at least 2 bugs which are direct consequences of this design decision. First is the incorrect usage of list_empty(), meant to denote whether "filter" is chained into ocelot->traps via filter->trap_list. This does not do the correct thing, because list_empty() checks whether "head->next == head", but in our case, head->next == head->prev == NULL. So we dereference NULL pointers and die when we call list_del(). Second is the fact that not all places that should remove the filter from ocelot->traps do so. One example is ocelot_vcap_block_remove_filter(), which is where we have the main kfree(filter). By keeping freed filters in ocelot->traps we end up in a use-after-free in felix_update_trapping_destinations(). Attempting to fix all the buggy patterns is a whack-a-mole game which makes the driver unmaintainable. Actually this is what the previous patch version attempted to do: https://patchwork.kernel.org/project/netdevbpf/patch/20220503115728.834457-3-vladimir.oltean@nxp.com/ but it introduced another set of bugs, because there are other places in which create VCAP filters, not just ocelot_vcap_filter_create(): - ocelot_trap_add() - felix_tag_8021q_vlan_add_rx() - felix_tag_8021q_vlan_add_tx() Relying on the convention that all those code paths must call INIT_LIST_HEAD(&filter->trap_list) is not going to scale. So let's do what should have been done in the first place and keep a bool in struct ocelot_vcap_filter which denotes whether we are looking at a trapping rule or not. Iterating now happens over the main VCAP IS2 block->rules. The advantage is that we no longer risk having stale references to a freed filter, since it is only present in that list. Fixes: e42bd4ed09aa ("net: mscc: ocelot: keep traps in a list") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-05 07:54:59 +08:00
list_for_each_entry(trap, &block_vcap_is2->rules, list) {
if (!trap->is_trap)
continue;
net: dsa: felix: update destinations of existing traps with ocelot-8021q Historically, the felix DSA driver has installed special traps such that PTP over L2 works with the ocelot-8021q tagging protocol; commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") has the details. Then the ocelot switch library also gained more comprehensive support for PTP traps through commit 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets"). Right now, PTP over L2 works using ocelot-8021q via the traps it has set for itself, but nothing else does. Consolidating the two code blocks would make ocelot-8021q gain support for PTP over L4 and tc-flower traps, and at the same time avoid some code and TCAM duplication. The traps are similar in intent, but different in execution, so some explanation is required. The traps set up by felix_setup_mmio_filtering() are VCAP IS1 filters, which have a PAG that chains them to a VCAP IS2 filter, and the IS2 is where the 'trap' action resides. The traps set up by ocelot_trap_add(), on the other hand, have a single filter, in VCAP IS2. The reason for chaining VCAP IS1 and IS2 in Felix was to ensure that the hardcoded traps take precedence and cannot be overridden by the Ocelot switch library. So in principle, the PTP traps needed for ocelot-8021q in the Felix driver can rely on ocelot_trap_add(), but the filters need to be patched to account for a quirk that LS1028A has: the quirk_no_xtr_irq described in commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping"). Live-patching is done by iterating through the trap list every time we know it has been updated, and transforming a trap into a redirect + CPU copy if ocelot-8021q is in use. Making the DSA ocelot-8021q tagger work with the Ocelot traps means we can eliminate the dedicated OCELOT_VCAP_IS1_TAG_8021Q_PTP_MMIO and OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO cookies. To minimize the patch delta, OCELOT_VCAP_IS2_MRP_TRAP takes the place of OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO (the alternative would have been to left-shift all cookie numbers by 1). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16 22:30:13 +08:00
/* Figure out the current trapping destination */
if (using_tag_8021q) {
/* Redirect to the tag_8021q CPU port. If timestamps
* are necessary, also copy trapped packets to the CPU
* port module.
*/
mask_mode = OCELOT_MASK_MODE_REDIRECT;
port_mask = BIT(cpu);
cpu_copy_ena = !!trap->take_ts;
} else {
/* Trap packets only to the CPU port module, which is
* redirected to the NPI port (the DSA CPU port)
*/
mask_mode = OCELOT_MASK_MODE_PERMIT_DENY;
port_mask = 0;
cpu_copy_ena = true;
}
net: dsa: felix: setup MMIO filtering rules for PTP when using tag_8021q Since the tag_8021q tagger is software-defined, it has no means by itself for retrieving hardware timestamps of PTP event messages. Because we do want to support PTP on ocelot even with tag_8021q, we need to use the CPU port module for that. The RX timestamp is present in the Extraction Frame Header. And because we can't use NPI mode which redirects the CPU queues to an "external CPU" (meaning the ARM CPU running Linux), then we need to poll the CPU port module through the MMIO registers to retrieve TX and RX timestamps. Sadly, on NXP LS1028A, the Felix switch was integrated into the SoC without wiring the extraction IRQ line to the ARM GIC. So, if we want to be notified of any PTP packets received on the CPU port module, we have a problem. There is a possible workaround, which is to use the Ethernet CPU port as a notification channel that packets are available on the CPU port module as well. When a PTP packet is received by the DSA tagger (without timestamp, of course), we go to the CPU extraction queues, poll for it there, then we drop the original Ethernet packet and masquerade the packet retrieved over MMIO (plus the timestamp) as the original when we inject it up the stack. Create a quirk in struct felix is selected by the Felix driver (but not by Seville, since that doesn't support PTP at all). We want to do this such that the workaround is minimally invasive for future switches that don't require this workaround. The only traffic for which we need timestamps is PTP traffic, so add a redirection rule to the CPU port module for this. Currently we only have the need for PTP over L2, so redirection rules for UDP ports 319 and 320 are TBD for now. Note that for the workaround of matching of PTP-over-Ethernet-port with PTP-over-MMIO queues to work properly, both channels need to be absolutely lossless. There are two parts to achieving that: - We keep flow control enabled on the tag_8021q CPU port - We put the DSA master interface in promiscuous mode, so it will never drop a PTP frame (for the profiles we are interested in, these are sent to the multicast MAC addresses of 01-80-c2-00-00-0e and 01-1b-19-00-00-00). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 06:38:00 +08:00
net: dsa: felix: update destinations of existing traps with ocelot-8021q Historically, the felix DSA driver has installed special traps such that PTP over L2 works with the ocelot-8021q tagging protocol; commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") has the details. Then the ocelot switch library also gained more comprehensive support for PTP traps through commit 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets"). Right now, PTP over L2 works using ocelot-8021q via the traps it has set for itself, but nothing else does. Consolidating the two code blocks would make ocelot-8021q gain support for PTP over L4 and tc-flower traps, and at the same time avoid some code and TCAM duplication. The traps are similar in intent, but different in execution, so some explanation is required. The traps set up by felix_setup_mmio_filtering() are VCAP IS1 filters, which have a PAG that chains them to a VCAP IS2 filter, and the IS2 is where the 'trap' action resides. The traps set up by ocelot_trap_add(), on the other hand, have a single filter, in VCAP IS2. The reason for chaining VCAP IS1 and IS2 in Felix was to ensure that the hardcoded traps take precedence and cannot be overridden by the Ocelot switch library. So in principle, the PTP traps needed for ocelot-8021q in the Felix driver can rely on ocelot_trap_add(), but the filters need to be patched to account for a quirk that LS1028A has: the quirk_no_xtr_irq described in commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping"). Live-patching is done by iterating through the trap list every time we know it has been updated, and transforming a trap into a redirect + CPU copy if ocelot-8021q is in use. Making the DSA ocelot-8021q tagger work with the Ocelot traps means we can eliminate the dedicated OCELOT_VCAP_IS1_TAG_8021Q_PTP_MMIO and OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO cookies. To minimize the patch delta, OCELOT_VCAP_IS2_MRP_TRAP takes the place of OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO (the alternative would have been to left-shift all cookie numbers by 1). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16 22:30:13 +08:00
if (trap->action.mask_mode == mask_mode &&
trap->action.port_mask == port_mask &&
trap->action.cpu_copy_ena == cpu_copy_ena)
continue;
net: dsa: felix: setup MMIO filtering rules for PTP when using tag_8021q Since the tag_8021q tagger is software-defined, it has no means by itself for retrieving hardware timestamps of PTP event messages. Because we do want to support PTP on ocelot even with tag_8021q, we need to use the CPU port module for that. The RX timestamp is present in the Extraction Frame Header. And because we can't use NPI mode which redirects the CPU queues to an "external CPU" (meaning the ARM CPU running Linux), then we need to poll the CPU port module through the MMIO registers to retrieve TX and RX timestamps. Sadly, on NXP LS1028A, the Felix switch was integrated into the SoC without wiring the extraction IRQ line to the ARM GIC. So, if we want to be notified of any PTP packets received on the CPU port module, we have a problem. There is a possible workaround, which is to use the Ethernet CPU port as a notification channel that packets are available on the CPU port module as well. When a PTP packet is received by the DSA tagger (without timestamp, of course), we go to the CPU extraction queues, poll for it there, then we drop the original Ethernet packet and masquerade the packet retrieved over MMIO (plus the timestamp) as the original when we inject it up the stack. Create a quirk in struct felix is selected by the Felix driver (but not by Seville, since that doesn't support PTP at all). We want to do this such that the workaround is minimally invasive for future switches that don't require this workaround. The only traffic for which we need timestamps is PTP traffic, so add a redirection rule to the CPU port module for this. Currently we only have the need for PTP over L2, so redirection rules for UDP ports 319 and 320 are TBD for now. Note that for the workaround of matching of PTP-over-Ethernet-port with PTP-over-MMIO queues to work properly, both channels need to be absolutely lossless. There are two parts to achieving that: - We keep flow control enabled on the tag_8021q CPU port - We put the DSA master interface in promiscuous mode, so it will never drop a PTP frame (for the profiles we are interested in, these are sent to the multicast MAC addresses of 01-80-c2-00-00-0e and 01-1b-19-00-00-00). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 06:38:00 +08:00
net: dsa: felix: update destinations of existing traps with ocelot-8021q Historically, the felix DSA driver has installed special traps such that PTP over L2 works with the ocelot-8021q tagging protocol; commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") has the details. Then the ocelot switch library also gained more comprehensive support for PTP traps through commit 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets"). Right now, PTP over L2 works using ocelot-8021q via the traps it has set for itself, but nothing else does. Consolidating the two code blocks would make ocelot-8021q gain support for PTP over L4 and tc-flower traps, and at the same time avoid some code and TCAM duplication. The traps are similar in intent, but different in execution, so some explanation is required. The traps set up by felix_setup_mmio_filtering() are VCAP IS1 filters, which have a PAG that chains them to a VCAP IS2 filter, and the IS2 is where the 'trap' action resides. The traps set up by ocelot_trap_add(), on the other hand, have a single filter, in VCAP IS2. The reason for chaining VCAP IS1 and IS2 in Felix was to ensure that the hardcoded traps take precedence and cannot be overridden by the Ocelot switch library. So in principle, the PTP traps needed for ocelot-8021q in the Felix driver can rely on ocelot_trap_add(), but the filters need to be patched to account for a quirk that LS1028A has: the quirk_no_xtr_irq described in commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping"). Live-patching is done by iterating through the trap list every time we know it has been updated, and transforming a trap into a redirect + CPU copy if ocelot-8021q is in use. Making the DSA ocelot-8021q tagger work with the Ocelot traps means we can eliminate the dedicated OCELOT_VCAP_IS1_TAG_8021Q_PTP_MMIO and OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO cookies. To minimize the patch delta, OCELOT_VCAP_IS2_MRP_TRAP takes the place of OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO (the alternative would have been to left-shift all cookie numbers by 1). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16 22:30:13 +08:00
trap->action.mask_mode = mask_mode;
trap->action.port_mask = port_mask;
trap->action.cpu_copy_ena = cpu_copy_ena;
net: dsa: felix: setup MMIO filtering rules for PTP when using tag_8021q Since the tag_8021q tagger is software-defined, it has no means by itself for retrieving hardware timestamps of PTP event messages. Because we do want to support PTP on ocelot even with tag_8021q, we need to use the CPU port module for that. The RX timestamp is present in the Extraction Frame Header. And because we can't use NPI mode which redirects the CPU queues to an "external CPU" (meaning the ARM CPU running Linux), then we need to poll the CPU port module through the MMIO registers to retrieve TX and RX timestamps. Sadly, on NXP LS1028A, the Felix switch was integrated into the SoC without wiring the extraction IRQ line to the ARM GIC. So, if we want to be notified of any PTP packets received on the CPU port module, we have a problem. There is a possible workaround, which is to use the Ethernet CPU port as a notification channel that packets are available on the CPU port module as well. When a PTP packet is received by the DSA tagger (without timestamp, of course), we go to the CPU extraction queues, poll for it there, then we drop the original Ethernet packet and masquerade the packet retrieved over MMIO (plus the timestamp) as the original when we inject it up the stack. Create a quirk in struct felix is selected by the Felix driver (but not by Seville, since that doesn't support PTP at all). We want to do this such that the workaround is minimally invasive for future switches that don't require this workaround. The only traffic for which we need timestamps is PTP traffic, so add a redirection rule to the CPU port module for this. Currently we only have the need for PTP over L2, so redirection rules for UDP ports 319 and 320 are TBD for now. Note that for the workaround of matching of PTP-over-Ethernet-port with PTP-over-MMIO queues to work properly, both channels need to be absolutely lossless. There are two parts to achieving that: - We keep flow control enabled on the tag_8021q CPU port - We put the DSA master interface in promiscuous mode, so it will never drop a PTP frame (for the profiles we are interested in, these are sent to the multicast MAC addresses of 01-80-c2-00-00-0e and 01-1b-19-00-00-00). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 06:38:00 +08:00
net: dsa: felix: update destinations of existing traps with ocelot-8021q Historically, the felix DSA driver has installed special traps such that PTP over L2 works with the ocelot-8021q tagging protocol; commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") has the details. Then the ocelot switch library also gained more comprehensive support for PTP traps through commit 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets"). Right now, PTP over L2 works using ocelot-8021q via the traps it has set for itself, but nothing else does. Consolidating the two code blocks would make ocelot-8021q gain support for PTP over L4 and tc-flower traps, and at the same time avoid some code and TCAM duplication. The traps are similar in intent, but different in execution, so some explanation is required. The traps set up by felix_setup_mmio_filtering() are VCAP IS1 filters, which have a PAG that chains them to a VCAP IS2 filter, and the IS2 is where the 'trap' action resides. The traps set up by ocelot_trap_add(), on the other hand, have a single filter, in VCAP IS2. The reason for chaining VCAP IS1 and IS2 in Felix was to ensure that the hardcoded traps take precedence and cannot be overridden by the Ocelot switch library. So in principle, the PTP traps needed for ocelot-8021q in the Felix driver can rely on ocelot_trap_add(), but the filters need to be patched to account for a quirk that LS1028A has: the quirk_no_xtr_irq described in commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping"). Live-patching is done by iterating through the trap list every time we know it has been updated, and transforming a trap into a redirect + CPU copy if ocelot-8021q is in use. Making the DSA ocelot-8021q tagger work with the Ocelot traps means we can eliminate the dedicated OCELOT_VCAP_IS1_TAG_8021Q_PTP_MMIO and OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO cookies. To minimize the patch delta, OCELOT_VCAP_IS2_MRP_TRAP takes the place of OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO (the alternative would have been to left-shift all cookie numbers by 1). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16 22:30:13 +08:00
err = ocelot_vcap_filter_replace(ocelot, trap);
if (err)
return err;
}
net: dsa: tag_ocelot_8021q: add support for PTP timestamping For TX timestamping, we use the felix_txtstamp method which is common with the regular (non-8021q) ocelot tagger. This method says that skb deferral is needed, prepares a timestamp request ID, and puts a clone of the skb in a queue waiting for the timestamp IRQ. felix_txtstamp is called by dsa_skb_tx_timestamp() just before the tagger's xmit method. In the tagger xmit, we divert the packets classified by dsa_skb_tx_timestamp() as PTP towards the MMIO-based injection registers, and we declare them as dead towards dsa_slave_xmit. If not PTP, we proceed with normal tag_8021q stuff. Then the timestamp IRQ fires, the clone queued up from felix_txtstamp is matched to the TX timestamp retrieved from the switch's FIFO based on the timestamp request ID, and the clone is delivered to the stack. On RX, thanks to the VCAP IS2 rule that redirects the frames with an EtherType for 1588 towards two destinations: - the CPU port module (for MMIO based extraction) and - if the "no XTR IRQ" workaround is in place, the dsa_8021q CPU port the relevant data path processing starts in the ptp_classify_raw BPF classifier installed by DSA in the RX data path (post tagger, which is completely unaware that it saw a PTP packet). This time we can't reuse the same implementation of .port_rxtstamp that also works with the default ocelot tagger. That is because felix_rxtstamp is given an skb with a freshly stripped DSA header, and it says "I don't need deferral for its RX timestamp, it's right in it, let me show you"; and it just points to the header right behind skb->data, from where it unpacks the timestamp and annotates the skb with it. The same thing cannot happen with tag_ocelot_8021q, because for one thing, the skb did not have an extraction frame header in the first place, but a VLAN tag with no timestamp information. So the code paths in felix_rxtstamp for the regular and 8021q tagger are completely independent. With tag_8021q, the timestamp must come from the packet's duplicate delivered to the CPU port module, but there is potentially complex logic to be handled [ and prone to reordering ] if we were to just start reading packets from the CPU port module, and try to match them to the one we received over Ethernet and which needs an RX timestamp. So we do something simple: we tell DSA "give me some time to think" (we request skb deferral by returning false from .port_rxtstamp) and we just drop the frame we got over Ethernet with no attempt to match it to anything - we just treat it as a notification that there's data to be processed from the CPU port module's queues. Then we proceed to read the packets from those, one by one, which we deliver up the stack, timestamped, using netif_rx - the same function that any driver would use anyway if it needed RX timestamp deferral. So the assumption is that we'll come across the PTP packet that triggered the CPU extraction notification eventually, but we don't know when exactly. Thanks to the VCAP IS2 trap/redirect rule and the exclusion of the CPU port module from the flooding replicators, only PTP frames should be present in the CPU port module's RX queues anyway. There is just one conflict between the VCAP IS2 trapping rule and the semantics of the BPF classifier. Namely, ptp_classify_raw() deems general messages as non-timestampable, but still, those are trapped to the CPU port module since they have an EtherType of ETH_P_1588. So, if the "no XTR IRQ" workaround is in place, we need to run another BPF classifier on the frames extracted over MMIO, to avoid duplicates being sent to the stack (once over Ethernet, once over MMIO). It doesn't look like it's possible to install VCAP IS2 rules based on keys extracted from the 1588 frame headers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 06:38:01 +08:00
net: dsa: felix: setup MMIO filtering rules for PTP when using tag_8021q Since the tag_8021q tagger is software-defined, it has no means by itself for retrieving hardware timestamps of PTP event messages. Because we do want to support PTP on ocelot even with tag_8021q, we need to use the CPU port module for that. The RX timestamp is present in the Extraction Frame Header. And because we can't use NPI mode which redirects the CPU queues to an "external CPU" (meaning the ARM CPU running Linux), then we need to poll the CPU port module through the MMIO registers to retrieve TX and RX timestamps. Sadly, on NXP LS1028A, the Felix switch was integrated into the SoC without wiring the extraction IRQ line to the ARM GIC. So, if we want to be notified of any PTP packets received on the CPU port module, we have a problem. There is a possible workaround, which is to use the Ethernet CPU port as a notification channel that packets are available on the CPU port module as well. When a PTP packet is received by the DSA tagger (without timestamp, of course), we go to the CPU extraction queues, poll for it there, then we drop the original Ethernet packet and masquerade the packet retrieved over MMIO (plus the timestamp) as the original when we inject it up the stack. Create a quirk in struct felix is selected by the Felix driver (but not by Seville, since that doesn't support PTP at all). We want to do this such that the workaround is minimally invasive for future switches that don't require this workaround. The only traffic for which we need timestamps is PTP traffic, so add a redirection rule to the CPU port module for this. Currently we only have the need for PTP over L2, so redirection rules for UDP ports 319 and 320 are TBD for now. Note that for the workaround of matching of PTP-over-Ethernet-port with PTP-over-MMIO queues to work properly, both channels need to be absolutely lossless. There are two parts to achieving that: - We keep flow control enabled on the tag_8021q CPU port - We put the DSA master interface in promiscuous mode, so it will never drop a PTP frame (for the profiles we are interested in, these are sent to the multicast MAC addresses of 01-80-c2-00-00-0e and 01-1b-19-00-00-00). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 06:38:00 +08:00
return 0;
}
static int felix_setup_tag_8021q(struct dsa_switch *ds, int cpu)
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
{
struct ocelot *ocelot = ds->priv;
struct dsa_port *dp;
int err;
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
felix_8021q_cpu_port_init(ocelot, cpu);
dsa_switch_for_each_available_port(dp, ds) {
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
/* This overwrites ocelot_init():
* Do not forward BPDU frames to the CPU port module,
* for 2 reasons:
* - When these packets are injected from the tag_8021q
* CPU port, we want them to go out, not loop back
* into the system.
* - STP traffic ingressing on a user port should go to
* the tag_8021q CPU port, not to the hardware CPU
* port module.
*/
ocelot_write_gix(ocelot,
ANA_PORT_CPU_FWD_BPDU_CFG_BPDU_REDIR_ENA(0),
ANA_PORT_CPU_FWD_BPDU_CFG, dp->index);
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
}
err = dsa_tag_8021q_register(ds, htons(ETH_P_8021AD));
net: dsa: let the core manage the tag_8021q context The basic problem description is as follows: Be there 3 switches in a daisy chain topology: | sw0p0 sw0p1 sw0p2 sw0p3 sw0p4 [ user ] [ user ] [ user ] [ dsa ] [ cpu ] | +---------+ | sw1p0 sw1p1 sw1p2 sw1p3 sw1p4 [ user ] [ user ] [ user ] [ dsa ] [ dsa ] | +---------+ | sw2p0 sw2p1 sw2p2 sw2p3 sw2p4 [ user ] [ user ] [ user ] [ user ] [ dsa ] The CPU will not be able to ping through the user ports of the bottom-most switch (like for example sw2p0), simply because tag_8021q was not coded up for this scenario - it has always assumed DSA switch trees with a single switch. To add support for the topology above, we must admit that the RX VLAN of sw2p0 must be added on some ports of switches 0 and 1 as well. This is in fact a textbook example of thing that can use the cross-chip notifier framework that DSA has set up in switch.c. There is only one problem: core DSA (switch.c) is not able right now to make the connection between a struct dsa_switch *ds and a struct dsa_8021q_context *ctx. Right now, it is drivers who call into tag_8021q.c and always provide a struct dsa_8021q_context *ctx pointer, and tag_8021q.c calls them back with the .tag_8021q_vlan_{add,del} methods. But with cross-chip notifiers, it is possible for tag_8021q to call drivers without drivers having ever asked for anything. A good example is right above: when sw2p0 wants to set itself up for tag_8021q, the .tag_8021q_vlan_add method needs to be called for switches 1 and 0, so that they transport sw2p0's VLANs towards the CPU without dropping them. So instead of letting drivers manage the tag_8021q context, add a tag_8021q_ctx pointer inside of struct dsa_switch, which will be populated when dsa_tag_8021q_register() returns success. The patch is fairly long-winded because we are partly reverting commit 5899ee367ab3 ("net: dsa: tag_8021q: add a context structure") which made the driver-facing tag_8021q API use "ctx" instead of "ds". Now that we can access "ctx" directly from "ds", this is no longer needed. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20 01:14:48 +08:00
if (err)
return err;
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
err = ocelot_migrate_mdbs(ocelot, BIT(ocelot->num_phys_ports),
BIT(cpu));
if (err)
goto out_tag_8021q_unregister;
felix_migrate_flood_to_tag_8021q_port(ds, cpu);
net: dsa: felix: update destinations of existing traps with ocelot-8021q Historically, the felix DSA driver has installed special traps such that PTP over L2 works with the ocelot-8021q tagging protocol; commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") has the details. Then the ocelot switch library also gained more comprehensive support for PTP traps through commit 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets"). Right now, PTP over L2 works using ocelot-8021q via the traps it has set for itself, but nothing else does. Consolidating the two code blocks would make ocelot-8021q gain support for PTP over L4 and tc-flower traps, and at the same time avoid some code and TCAM duplication. The traps are similar in intent, but different in execution, so some explanation is required. The traps set up by felix_setup_mmio_filtering() are VCAP IS1 filters, which have a PAG that chains them to a VCAP IS2 filter, and the IS2 is where the 'trap' action resides. The traps set up by ocelot_trap_add(), on the other hand, have a single filter, in VCAP IS2. The reason for chaining VCAP IS1 and IS2 in Felix was to ensure that the hardcoded traps take precedence and cannot be overridden by the Ocelot switch library. So in principle, the PTP traps needed for ocelot-8021q in the Felix driver can rely on ocelot_trap_add(), but the filters need to be patched to account for a quirk that LS1028A has: the quirk_no_xtr_irq described in commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping"). Live-patching is done by iterating through the trap list every time we know it has been updated, and transforming a trap into a redirect + CPU copy if ocelot-8021q is in use. Making the DSA ocelot-8021q tagger work with the Ocelot traps means we can eliminate the dedicated OCELOT_VCAP_IS1_TAG_8021Q_PTP_MMIO and OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO cookies. To minimize the patch delta, OCELOT_VCAP_IS2_MRP_TRAP takes the place of OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO (the alternative would have been to left-shift all cookie numbers by 1). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16 22:30:13 +08:00
err = felix_update_trapping_destinations(ds, true);
net: dsa: felix: setup MMIO filtering rules for PTP when using tag_8021q Since the tag_8021q tagger is software-defined, it has no means by itself for retrieving hardware timestamps of PTP event messages. Because we do want to support PTP on ocelot even with tag_8021q, we need to use the CPU port module for that. The RX timestamp is present in the Extraction Frame Header. And because we can't use NPI mode which redirects the CPU queues to an "external CPU" (meaning the ARM CPU running Linux), then we need to poll the CPU port module through the MMIO registers to retrieve TX and RX timestamps. Sadly, on NXP LS1028A, the Felix switch was integrated into the SoC without wiring the extraction IRQ line to the ARM GIC. So, if we want to be notified of any PTP packets received on the CPU port module, we have a problem. There is a possible workaround, which is to use the Ethernet CPU port as a notification channel that packets are available on the CPU port module as well. When a PTP packet is received by the DSA tagger (without timestamp, of course), we go to the CPU extraction queues, poll for it there, then we drop the original Ethernet packet and masquerade the packet retrieved over MMIO (plus the timestamp) as the original when we inject it up the stack. Create a quirk in struct felix is selected by the Felix driver (but not by Seville, since that doesn't support PTP at all). We want to do this such that the workaround is minimally invasive for future switches that don't require this workaround. The only traffic for which we need timestamps is PTP traffic, so add a redirection rule to the CPU port module for this. Currently we only have the need for PTP over L2, so redirection rules for UDP ports 319 and 320 are TBD for now. Note that for the workaround of matching of PTP-over-Ethernet-port with PTP-over-MMIO queues to work properly, both channels need to be absolutely lossless. There are two parts to achieving that: - We keep flow control enabled on the tag_8021q CPU port - We put the DSA master interface in promiscuous mode, so it will never drop a PTP frame (for the profiles we are interested in, these are sent to the multicast MAC addresses of 01-80-c2-00-00-0e and 01-1b-19-00-00-00). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 06:38:00 +08:00
if (err)
goto out_migrate_flood;
net: dsa: felix: setup MMIO filtering rules for PTP when using tag_8021q Since the tag_8021q tagger is software-defined, it has no means by itself for retrieving hardware timestamps of PTP event messages. Because we do want to support PTP on ocelot even with tag_8021q, we need to use the CPU port module for that. The RX timestamp is present in the Extraction Frame Header. And because we can't use NPI mode which redirects the CPU queues to an "external CPU" (meaning the ARM CPU running Linux), then we need to poll the CPU port module through the MMIO registers to retrieve TX and RX timestamps. Sadly, on NXP LS1028A, the Felix switch was integrated into the SoC without wiring the extraction IRQ line to the ARM GIC. So, if we want to be notified of any PTP packets received on the CPU port module, we have a problem. There is a possible workaround, which is to use the Ethernet CPU port as a notification channel that packets are available on the CPU port module as well. When a PTP packet is received by the DSA tagger (without timestamp, of course), we go to the CPU extraction queues, poll for it there, then we drop the original Ethernet packet and masquerade the packet retrieved over MMIO (plus the timestamp) as the original when we inject it up the stack. Create a quirk in struct felix is selected by the Felix driver (but not by Seville, since that doesn't support PTP at all). We want to do this such that the workaround is minimally invasive for future switches that don't require this workaround. The only traffic for which we need timestamps is PTP traffic, so add a redirection rule to the CPU port module for this. Currently we only have the need for PTP over L2, so redirection rules for UDP ports 319 and 320 are TBD for now. Note that for the workaround of matching of PTP-over-Ethernet-port with PTP-over-MMIO queues to work properly, both channels need to be absolutely lossless. There are two parts to achieving that: - We keep flow control enabled on the tag_8021q CPU port - We put the DSA master interface in promiscuous mode, so it will never drop a PTP frame (for the profiles we are interested in, these are sent to the multicast MAC addresses of 01-80-c2-00-00-0e and 01-1b-19-00-00-00). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 06:38:00 +08:00
net: dsa: felix: update destinations of existing traps with ocelot-8021q Historically, the felix DSA driver has installed special traps such that PTP over L2 works with the ocelot-8021q tagging protocol; commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") has the details. Then the ocelot switch library also gained more comprehensive support for PTP traps through commit 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets"). Right now, PTP over L2 works using ocelot-8021q via the traps it has set for itself, but nothing else does. Consolidating the two code blocks would make ocelot-8021q gain support for PTP over L4 and tc-flower traps, and at the same time avoid some code and TCAM duplication. The traps are similar in intent, but different in execution, so some explanation is required. The traps set up by felix_setup_mmio_filtering() are VCAP IS1 filters, which have a PAG that chains them to a VCAP IS2 filter, and the IS2 is where the 'trap' action resides. The traps set up by ocelot_trap_add(), on the other hand, have a single filter, in VCAP IS2. The reason for chaining VCAP IS1 and IS2 in Felix was to ensure that the hardcoded traps take precedence and cannot be overridden by the Ocelot switch library. So in principle, the PTP traps needed for ocelot-8021q in the Felix driver can rely on ocelot_trap_add(), but the filters need to be patched to account for a quirk that LS1028A has: the quirk_no_xtr_irq described in commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping"). Live-patching is done by iterating through the trap list every time we know it has been updated, and transforming a trap into a redirect + CPU copy if ocelot-8021q is in use. Making the DSA ocelot-8021q tagger work with the Ocelot traps means we can eliminate the dedicated OCELOT_VCAP_IS1_TAG_8021Q_PTP_MMIO and OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO cookies. To minimize the patch delta, OCELOT_VCAP_IS2_MRP_TRAP takes the place of OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO (the alternative would have been to left-shift all cookie numbers by 1). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16 22:30:13 +08:00
/* The ownership of the CPU port module's queues might have just been
* transferred to the tag_8021q tagger from the NPI-based tagger.
* So there might still be all sorts of crap in the queues. On the
* other hand, the MMIO-based matching of PTP frames is very brittle,
* so we need to be careful that there are no extra frames to be
* dequeued over MMIO, since we would never know to discard them.
*/
ocelot_drain_cpu_queue(ocelot, 0);
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
return 0;
out_migrate_flood:
felix_migrate_flood_to_npi_port(ds, cpu);
ocelot_migrate_mdbs(ocelot, BIT(cpu), BIT(ocelot->num_phys_ports));
net: dsa: let the core manage the tag_8021q context The basic problem description is as follows: Be there 3 switches in a daisy chain topology: | sw0p0 sw0p1 sw0p2 sw0p3 sw0p4 [ user ] [ user ] [ user ] [ dsa ] [ cpu ] | +---------+ | sw1p0 sw1p1 sw1p2 sw1p3 sw1p4 [ user ] [ user ] [ user ] [ dsa ] [ dsa ] | +---------+ | sw2p0 sw2p1 sw2p2 sw2p3 sw2p4 [ user ] [ user ] [ user ] [ user ] [ dsa ] The CPU will not be able to ping through the user ports of the bottom-most switch (like for example sw2p0), simply because tag_8021q was not coded up for this scenario - it has always assumed DSA switch trees with a single switch. To add support for the topology above, we must admit that the RX VLAN of sw2p0 must be added on some ports of switches 0 and 1 as well. This is in fact a textbook example of thing that can use the cross-chip notifier framework that DSA has set up in switch.c. There is only one problem: core DSA (switch.c) is not able right now to make the connection between a struct dsa_switch *ds and a struct dsa_8021q_context *ctx. Right now, it is drivers who call into tag_8021q.c and always provide a struct dsa_8021q_context *ctx pointer, and tag_8021q.c calls them back with the .tag_8021q_vlan_{add,del} methods. But with cross-chip notifiers, it is possible for tag_8021q to call drivers without drivers having ever asked for anything. A good example is right above: when sw2p0 wants to set itself up for tag_8021q, the .tag_8021q_vlan_add method needs to be called for switches 1 and 0, so that they transport sw2p0's VLANs towards the CPU without dropping them. So instead of letting drivers manage the tag_8021q context, add a tag_8021q_ctx pointer inside of struct dsa_switch, which will be populated when dsa_tag_8021q_register() returns success. The patch is fairly long-winded because we are partly reverting commit 5899ee367ab3 ("net: dsa: tag_8021q: add a context structure") which made the driver-facing tag_8021q API use "ctx" instead of "ds". Now that we can access "ctx" directly from "ds", this is no longer needed. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20 01:14:48 +08:00
out_tag_8021q_unregister:
dsa_tag_8021q_unregister(ds);
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
return err;
}
static void felix_teardown_tag_8021q(struct dsa_switch *ds, int cpu)
{
struct ocelot *ocelot = ds->priv;
struct dsa_port *dp;
int err;
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
net: dsa: felix: update destinations of existing traps with ocelot-8021q Historically, the felix DSA driver has installed special traps such that PTP over L2 works with the ocelot-8021q tagging protocol; commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") has the details. Then the ocelot switch library also gained more comprehensive support for PTP traps through commit 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets"). Right now, PTP over L2 works using ocelot-8021q via the traps it has set for itself, but nothing else does. Consolidating the two code blocks would make ocelot-8021q gain support for PTP over L4 and tc-flower traps, and at the same time avoid some code and TCAM duplication. The traps are similar in intent, but different in execution, so some explanation is required. The traps set up by felix_setup_mmio_filtering() are VCAP IS1 filters, which have a PAG that chains them to a VCAP IS2 filter, and the IS2 is where the 'trap' action resides. The traps set up by ocelot_trap_add(), on the other hand, have a single filter, in VCAP IS2. The reason for chaining VCAP IS1 and IS2 in Felix was to ensure that the hardcoded traps take precedence and cannot be overridden by the Ocelot switch library. So in principle, the PTP traps needed for ocelot-8021q in the Felix driver can rely on ocelot_trap_add(), but the filters need to be patched to account for a quirk that LS1028A has: the quirk_no_xtr_irq described in commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping"). Live-patching is done by iterating through the trap list every time we know it has been updated, and transforming a trap into a redirect + CPU copy if ocelot-8021q is in use. Making the DSA ocelot-8021q tagger work with the Ocelot traps means we can eliminate the dedicated OCELOT_VCAP_IS1_TAG_8021Q_PTP_MMIO and OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO cookies. To minimize the patch delta, OCELOT_VCAP_IS2_MRP_TRAP takes the place of OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO (the alternative would have been to left-shift all cookie numbers by 1). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16 22:30:13 +08:00
err = felix_update_trapping_destinations(ds, false);
net: dsa: felix: setup MMIO filtering rules for PTP when using tag_8021q Since the tag_8021q tagger is software-defined, it has no means by itself for retrieving hardware timestamps of PTP event messages. Because we do want to support PTP on ocelot even with tag_8021q, we need to use the CPU port module for that. The RX timestamp is present in the Extraction Frame Header. And because we can't use NPI mode which redirects the CPU queues to an "external CPU" (meaning the ARM CPU running Linux), then we need to poll the CPU port module through the MMIO registers to retrieve TX and RX timestamps. Sadly, on NXP LS1028A, the Felix switch was integrated into the SoC without wiring the extraction IRQ line to the ARM GIC. So, if we want to be notified of any PTP packets received on the CPU port module, we have a problem. There is a possible workaround, which is to use the Ethernet CPU port as a notification channel that packets are available on the CPU port module as well. When a PTP packet is received by the DSA tagger (without timestamp, of course), we go to the CPU extraction queues, poll for it there, then we drop the original Ethernet packet and masquerade the packet retrieved over MMIO (plus the timestamp) as the original when we inject it up the stack. Create a quirk in struct felix is selected by the Felix driver (but not by Seville, since that doesn't support PTP at all). We want to do this such that the workaround is minimally invasive for future switches that don't require this workaround. The only traffic for which we need timestamps is PTP traffic, so add a redirection rule to the CPU port module for this. Currently we only have the need for PTP over L2, so redirection rules for UDP ports 319 and 320 are TBD for now. Note that for the workaround of matching of PTP-over-Ethernet-port with PTP-over-MMIO queues to work properly, both channels need to be absolutely lossless. There are two parts to achieving that: - We keep flow control enabled on the tag_8021q CPU port - We put the DSA master interface in promiscuous mode, so it will never drop a PTP frame (for the profiles we are interested in, these are sent to the multicast MAC addresses of 01-80-c2-00-00-0e and 01-1b-19-00-00-00). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 06:38:00 +08:00
if (err)
dev_err(ds->dev, "felix_teardown_mmio_filtering returned %d",
err);
net: dsa: let the core manage the tag_8021q context The basic problem description is as follows: Be there 3 switches in a daisy chain topology: | sw0p0 sw0p1 sw0p2 sw0p3 sw0p4 [ user ] [ user ] [ user ] [ dsa ] [ cpu ] | +---------+ | sw1p0 sw1p1 sw1p2 sw1p3 sw1p4 [ user ] [ user ] [ user ] [ dsa ] [ dsa ] | +---------+ | sw2p0 sw2p1 sw2p2 sw2p3 sw2p4 [ user ] [ user ] [ user ] [ user ] [ dsa ] The CPU will not be able to ping through the user ports of the bottom-most switch (like for example sw2p0), simply because tag_8021q was not coded up for this scenario - it has always assumed DSA switch trees with a single switch. To add support for the topology above, we must admit that the RX VLAN of sw2p0 must be added on some ports of switches 0 and 1 as well. This is in fact a textbook example of thing that can use the cross-chip notifier framework that DSA has set up in switch.c. There is only one problem: core DSA (switch.c) is not able right now to make the connection between a struct dsa_switch *ds and a struct dsa_8021q_context *ctx. Right now, it is drivers who call into tag_8021q.c and always provide a struct dsa_8021q_context *ctx pointer, and tag_8021q.c calls them back with the .tag_8021q_vlan_{add,del} methods. But with cross-chip notifiers, it is possible for tag_8021q to call drivers without drivers having ever asked for anything. A good example is right above: when sw2p0 wants to set itself up for tag_8021q, the .tag_8021q_vlan_add method needs to be called for switches 1 and 0, so that they transport sw2p0's VLANs towards the CPU without dropping them. So instead of letting drivers manage the tag_8021q context, add a tag_8021q_ctx pointer inside of struct dsa_switch, which will be populated when dsa_tag_8021q_register() returns success. The patch is fairly long-winded because we are partly reverting commit 5899ee367ab3 ("net: dsa: tag_8021q: add a context structure") which made the driver-facing tag_8021q API use "ctx" instead of "ds". Now that we can access "ctx" directly from "ds", this is no longer needed. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20 01:14:48 +08:00
dsa_tag_8021q_unregister(ds);
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
dsa_switch_for_each_available_port(dp, ds) {
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
/* Restore the logic from ocelot_init:
* do not forward BPDU frames to the front ports.
*/
ocelot_write_gix(ocelot,
ANA_PORT_CPU_FWD_BPDU_CFG_BPDU_REDIR_ENA(0xffff),
ANA_PORT_CPU_FWD_BPDU_CFG,
dp->index);
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
}
felix_8021q_cpu_port_deinit(ocelot, cpu);
}
/* The CPU port module is connected to the Node Processor Interface (NPI). This
* is the mode through which frames can be injected from and extracted to an
* external CPU, over Ethernet. In NXP SoCs, the "external CPU" is the ARM CPU
* running Linux, and this forms a DSA setup together with the enetc or fman
* DSA master.
*/
static void felix_npi_port_init(struct ocelot *ocelot, int port)
{
ocelot->npi = port;
ocelot_write(ocelot, QSYS_EXT_CPU_CFG_EXT_CPUQ_MSK_M |
QSYS_EXT_CPU_CFG_EXT_CPU_PORT(port),
QSYS_EXT_CPU_CFG);
/* NPI port Injection/Extraction configuration */
ocelot_fields_write(ocelot, port, SYS_PORT_MODE_INCL_XTR_HDR,
ocelot->npi_xtr_prefix);
ocelot_fields_write(ocelot, port, SYS_PORT_MODE_INCL_INJ_HDR,
ocelot->npi_inj_prefix);
/* Disable transmission of pause frames */
ocelot_fields_write(ocelot, port, SYS_PAUSE_CFG_PAUSE_ENA, 0);
}
static void felix_npi_port_deinit(struct ocelot *ocelot, int port)
{
/* Restore hardware defaults */
int unused_port = ocelot->num_phys_ports + 2;
ocelot->npi = -1;
ocelot_write(ocelot, QSYS_EXT_CPU_CFG_EXT_CPU_PORT(unused_port),
QSYS_EXT_CPU_CFG);
ocelot_fields_write(ocelot, port, SYS_PORT_MODE_INCL_XTR_HDR,
OCELOT_TAG_PREFIX_DISABLED);
ocelot_fields_write(ocelot, port, SYS_PORT_MODE_INCL_INJ_HDR,
OCELOT_TAG_PREFIX_DISABLED);
/* Enable transmission of pause frames */
ocelot_fields_write(ocelot, port, SYS_PAUSE_CFG_PAUSE_ENA, 1);
}
static int felix_setup_tag_npi(struct dsa_switch *ds, int cpu)
{
struct ocelot *ocelot = ds->priv;
int err;
err = ocelot_migrate_mdbs(ocelot, BIT(cpu),
BIT(ocelot->num_phys_ports));
if (err)
return err;
felix_migrate_flood_to_npi_port(ds, cpu);
felix_npi_port_init(ocelot, cpu);
return 0;
}
static void felix_teardown_tag_npi(struct dsa_switch *ds, int cpu)
{
struct ocelot *ocelot = ds->priv;
felix_npi_port_deinit(ocelot, cpu);
}
static int felix_set_tag_protocol(struct dsa_switch *ds, int cpu,
enum dsa_tag_protocol proto)
{
int err;
switch (proto) {
net: dsa: tag_ocelot: create separate tagger for Seville The ocelot tagger is a hot mess currently, it relies on memory initialized by the attached driver for basic frame transmission. This is against all that DSA tagging protocols stand for, which is that the transmission and reception of a DSA-tagged frame, the data path, should be independent from the switch control path, because the tag protocol is in principle hot-pluggable and reusable across switches (even if in practice it wasn't until very recently). But if another driver like dsa_loop wants to make use of tag_ocelot, it couldn't. This was done to have common code between Felix and Ocelot, which have one bit difference in the frame header format. Quoting from commit 67c2404922c2 ("net: dsa: felix: create a template for the DSA tags on xmit"): Other alternatives have been analyzed, such as: - Create a separate tag_seville.c: too much code duplication for just 1 bit field difference. - Create a separate DSA_TAG_PROTO_SEVILLE under tag_ocelot.c, just like tag_brcm.c, which would have a separate .xmit function. Again, too much code duplication for just 1 bit field difference. - Allocate the template from the init function of the tag_ocelot.c module, instead of from the driver: couldn't figure out a method of accessing the correct port template corresponding to the correct tagger in the .xmit function. The really interesting part is that Seville should have had its own tagging protocol defined - it is not compatible on the wire with Ocelot, even for that single bit. In principle, a packet generated by DSA_TAG_PROTO_OCELOT when booted on NXP LS1028A would look in a certain way, but when booted on NXP T1040 it would look differently. The reverse is also true: a packet generated by a Seville switch would be interpreted incorrectly by Wireshark if it was told it was generated by an Ocelot switch. Actually things are a bit more nuanced. If we concentrate only on the DSA tag, what I said above is true, but Ocelot/Seville also support an optional DSA tag prefix, which can be short or long, and it is possible to distinguish the two taggers based on an integer constant put in that prefix. Nonetheless, creating a separate tagger is still justified, since the tag prefix is optional, and without it, there is again no way to distinguish. Claiming backwards binary compatibility is a bit more tough, since I've already changed the format of tag_ocelot once, in commit 5124197ce58b ("net: dsa: tag_ocelot: use a short prefix on both ingress and egress"). Therefore I am not very concerned with treating this as a bugfix and backporting it to stable kernels (which would be another mess due to the fact that there would be lots of conflicts with the other DSA_TAG_PROTO* definitions). It's just simpler to say that the string values of the taggers have ABI value starting with kernel 5.12, which will be when the changing of tag protocol via /sys/class/net/<dsa-master>/dsa/tagging goes live. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 06:37:58 +08:00
case DSA_TAG_PROTO_SEVILLE:
case DSA_TAG_PROTO_OCELOT:
err = felix_setup_tag_npi(ds, cpu);
break;
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
case DSA_TAG_PROTO_OCELOT_8021Q:
err = felix_setup_tag_8021q(ds, cpu);
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
break;
default:
err = -EPROTONOSUPPORT;
}
return err;
}
static void felix_del_tag_protocol(struct dsa_switch *ds, int cpu,
enum dsa_tag_protocol proto)
{
switch (proto) {
net: dsa: tag_ocelot: create separate tagger for Seville The ocelot tagger is a hot mess currently, it relies on memory initialized by the attached driver for basic frame transmission. This is against all that DSA tagging protocols stand for, which is that the transmission and reception of a DSA-tagged frame, the data path, should be independent from the switch control path, because the tag protocol is in principle hot-pluggable and reusable across switches (even if in practice it wasn't until very recently). But if another driver like dsa_loop wants to make use of tag_ocelot, it couldn't. This was done to have common code between Felix and Ocelot, which have one bit difference in the frame header format. Quoting from commit 67c2404922c2 ("net: dsa: felix: create a template for the DSA tags on xmit"): Other alternatives have been analyzed, such as: - Create a separate tag_seville.c: too much code duplication for just 1 bit field difference. - Create a separate DSA_TAG_PROTO_SEVILLE under tag_ocelot.c, just like tag_brcm.c, which would have a separate .xmit function. Again, too much code duplication for just 1 bit field difference. - Allocate the template from the init function of the tag_ocelot.c module, instead of from the driver: couldn't figure out a method of accessing the correct port template corresponding to the correct tagger in the .xmit function. The really interesting part is that Seville should have had its own tagging protocol defined - it is not compatible on the wire with Ocelot, even for that single bit. In principle, a packet generated by DSA_TAG_PROTO_OCELOT when booted on NXP LS1028A would look in a certain way, but when booted on NXP T1040 it would look differently. The reverse is also true: a packet generated by a Seville switch would be interpreted incorrectly by Wireshark if it was told it was generated by an Ocelot switch. Actually things are a bit more nuanced. If we concentrate only on the DSA tag, what I said above is true, but Ocelot/Seville also support an optional DSA tag prefix, which can be short or long, and it is possible to distinguish the two taggers based on an integer constant put in that prefix. Nonetheless, creating a separate tagger is still justified, since the tag prefix is optional, and without it, there is again no way to distinguish. Claiming backwards binary compatibility is a bit more tough, since I've already changed the format of tag_ocelot once, in commit 5124197ce58b ("net: dsa: tag_ocelot: use a short prefix on both ingress and egress"). Therefore I am not very concerned with treating this as a bugfix and backporting it to stable kernels (which would be another mess due to the fact that there would be lots of conflicts with the other DSA_TAG_PROTO* definitions). It's just simpler to say that the string values of the taggers have ABI value starting with kernel 5.12, which will be when the changing of tag protocol via /sys/class/net/<dsa-master>/dsa/tagging goes live. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 06:37:58 +08:00
case DSA_TAG_PROTO_SEVILLE:
case DSA_TAG_PROTO_OCELOT:
felix_teardown_tag_npi(ds, cpu);
break;
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
case DSA_TAG_PROTO_OCELOT_8021Q:
felix_teardown_tag_8021q(ds, cpu);
break;
default:
break;
}
}
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
/* This always leaves the switch in a consistent state, because although the
* tag_8021q setup can fail, the NPI setup can't. So either the change is made,
* or the restoration is guaranteed to work.
*/
static int felix_change_tag_protocol(struct dsa_switch *ds, int cpu,
enum dsa_tag_protocol proto)
{
struct ocelot *ocelot = ds->priv;
struct felix *felix = ocelot_to_felix(ocelot);
enum dsa_tag_protocol old_proto = felix->tag_proto;
bool cpu_port_active = false;
struct dsa_port *dp;
int err;
net: dsa: tag_ocelot: create separate tagger for Seville The ocelot tagger is a hot mess currently, it relies on memory initialized by the attached driver for basic frame transmission. This is against all that DSA tagging protocols stand for, which is that the transmission and reception of a DSA-tagged frame, the data path, should be independent from the switch control path, because the tag protocol is in principle hot-pluggable and reusable across switches (even if in practice it wasn't until very recently). But if another driver like dsa_loop wants to make use of tag_ocelot, it couldn't. This was done to have common code between Felix and Ocelot, which have one bit difference in the frame header format. Quoting from commit 67c2404922c2 ("net: dsa: felix: create a template for the DSA tags on xmit"): Other alternatives have been analyzed, such as: - Create a separate tag_seville.c: too much code duplication for just 1 bit field difference. - Create a separate DSA_TAG_PROTO_SEVILLE under tag_ocelot.c, just like tag_brcm.c, which would have a separate .xmit function. Again, too much code duplication for just 1 bit field difference. - Allocate the template from the init function of the tag_ocelot.c module, instead of from the driver: couldn't figure out a method of accessing the correct port template corresponding to the correct tagger in the .xmit function. The really interesting part is that Seville should have had its own tagging protocol defined - it is not compatible on the wire with Ocelot, even for that single bit. In principle, a packet generated by DSA_TAG_PROTO_OCELOT when booted on NXP LS1028A would look in a certain way, but when booted on NXP T1040 it would look differently. The reverse is also true: a packet generated by a Seville switch would be interpreted incorrectly by Wireshark if it was told it was generated by an Ocelot switch. Actually things are a bit more nuanced. If we concentrate only on the DSA tag, what I said above is true, but Ocelot/Seville also support an optional DSA tag prefix, which can be short or long, and it is possible to distinguish the two taggers based on an integer constant put in that prefix. Nonetheless, creating a separate tagger is still justified, since the tag prefix is optional, and without it, there is again no way to distinguish. Claiming backwards binary compatibility is a bit more tough, since I've already changed the format of tag_ocelot once, in commit 5124197ce58b ("net: dsa: tag_ocelot: use a short prefix on both ingress and egress"). Therefore I am not very concerned with treating this as a bugfix and backporting it to stable kernels (which would be another mess due to the fact that there would be lots of conflicts with the other DSA_TAG_PROTO* definitions). It's just simpler to say that the string values of the taggers have ABI value starting with kernel 5.12, which will be when the changing of tag protocol via /sys/class/net/<dsa-master>/dsa/tagging goes live. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 06:37:58 +08:00
if (proto != DSA_TAG_PROTO_SEVILLE &&
proto != DSA_TAG_PROTO_OCELOT &&
net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 09:00:09 +08:00
proto != DSA_TAG_PROTO_OCELOT_8021Q)
return -EPROTONOSUPPORT;
/* We don't support multiple CPU ports, yet the DT blob may have
* multiple CPU ports defined. The first CPU port is the active one,
* the others are inactive. In this case, DSA will call
* ->change_tag_protocol() multiple times, once per CPU port.
* Since we implement the tagging protocol change towards "ocelot" or
* "seville" as effectively initializing the NPI port, what we are
* doing is effectively changing who the NPI port is to the last @cpu
* argument passed, which is an unused DSA CPU port and not the one
* that should actively pass traffic.
* Suppress DSA's calls on CPU ports that are inactive.
*/
dsa_switch_for_each_user_port(dp, ds) {
if (dp->cpu_dp->index == cpu) {
cpu_port_active = true;
break;
}
}
if (!cpu_port_active)
return 0;
felix_del_tag_protocol(ds, cpu, old_proto);
err = felix_set_tag_protocol(ds, cpu, proto);
if (err) {
felix_set_tag_protocol(ds, cpu, old_proto);
return err;
}
felix->tag_proto = proto;
return 0;
}
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
static enum dsa_tag_protocol felix_get_tag_protocol(struct dsa_switch *ds,
int port,
enum dsa_tag_protocol mp)
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
{
struct ocelot *ocelot = ds->priv;
struct felix *felix = ocelot_to_felix(ocelot);
return felix->tag_proto;
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
}
net: dsa: felix: manage host flooding using a specific driver callback At the time - commit 7569459a52c9 ("net: dsa: manage flooding on the CPU ports") - not introducing a dedicated switch callback for host flooding made sense, because for the only user, the felix driver, there was nothing different to do for the CPU port than set the flood flags on the CPU port just like on any other bridge port. There are 2 reasons why this approach is not good enough, however. (1) Other drivers, like sja1105, support configuring flooding as a function of {ingress port, egress port}, whereas the DSA ->port_bridge_flags() function only operates on an egress port. So with that driver we'd have useless host flooding from user ports which don't need it. (2) Even with the felix driver, support for multiple CPU ports makes it difficult to piggyback on ->port_bridge_flags(). The way in which the felix driver is going to support host-filtered addresses with multiple CPU ports is that it will direct these addresses towards both CPU ports (in a sort of multicast fashion), then restrict the forwarding to only one of the two using the forwarding masks. Consequently, flooding will also be enabled towards both CPU ports. However, ->port_bridge_flags() gets passed the index of a single CPU port, and that leaves the flood settings out of sync between the 2 CPU ports. This is to say, it's better to have a specific driver method for host flooding, which takes the user port as argument. This solves problem (1) by allowing the driver to do different things for different user ports, and problem (2) by abstracting the operation and letting the driver do whatever, rather than explicitly making the DSA core point to the CPU port it thinks needs to be touched. This new method also creates a problem, which is that cross-chip setups are not handled. However I don't have hardware right now where I can test what is the proper thing to do, and there isn't hardware compatible with multi-switch trees that supports host flooding. So it remains a problem to be tackled in the future. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11 17:50:17 +08:00
static void felix_port_set_host_flood(struct dsa_switch *ds, int port,
bool uc, bool mc)
{
struct ocelot *ocelot = ds->priv;
struct felix *felix = ocelot_to_felix(ocelot);
unsigned long mask, val;
if (uc)
felix->host_flood_uc_mask |= BIT(port);
else
felix->host_flood_uc_mask &= ~BIT(port);
if (mc)
felix->host_flood_mc_mask |= BIT(port);
else
felix->host_flood_mc_mask &= ~BIT(port);
if (felix->tag_proto == DSA_TAG_PROTO_OCELOT_8021Q)
mask = dsa_cpu_ports(ds);
else
mask = BIT(ocelot->num_phys_ports);
val = (felix->host_flood_uc_mask) ? mask : 0;
ocelot_rmw_rix(ocelot, val, mask, ANA_PGID_PGID, PGID_UC);
val = (felix->host_flood_mc_mask) ? mask : 0;
ocelot_rmw_rix(ocelot, val, mask, ANA_PGID_PGID, PGID_MC);
ocelot_rmw_rix(ocelot, val, mask, ANA_PGID_PGID, PGID_MCIPV4);
ocelot_rmw_rix(ocelot, val, mask, ANA_PGID_PGID, PGID_MCIPV6);
}
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
static int felix_set_ageing_time(struct dsa_switch *ds,
unsigned int ageing_time)
{
struct ocelot *ocelot = ds->priv;
ocelot_set_ageing_time(ocelot, ageing_time);
return 0;
}
static void felix_port_fast_age(struct dsa_switch *ds, int port)
{
struct ocelot *ocelot = ds->priv;
int err;
err = ocelot_mact_flush(ocelot, port);
if (err)
dev_err(ds->dev, "Flushing MAC table on port %d returned %pe\n",
port, ERR_PTR(err));
}
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
static int felix_fdb_dump(struct dsa_switch *ds, int port,
dsa_fdb_dump_cb_t *cb, void *data)
{
struct ocelot *ocelot = ds->priv;
return ocelot_fdb_dump(ocelot, port, cb, data);
}
static int felix_fdb_add(struct dsa_switch *ds, int port,
net: dsa: request drivers to perform FDB isolation For DSA, to encourage drivers to perform FDB isolation simply means to track which bridge does each FDB and MDB entry belong to. It then becomes the driver responsibility to use something that makes the FDB entry from one bridge not match the FDB lookup of ports from other bridges. The top-level functions where the bridge is determined are: - dsa_port_fdb_{add,del} - dsa_port_host_fdb_{add,del} - dsa_port_mdb_{add,del} - dsa_port_host_mdb_{add,del} aka the pre-crosschip-notifier functions. Changing the API to pass a reference to a bridge is not superfluous, and looking at the passed bridge argument is not the same as having the driver look at dsa_to_port(ds, port)->bridge from the ->port_fdb_add() method. DSA installs FDB and MDB entries on shared (CPU and DSA) ports as well, and those do not have any dp->bridge information to retrieve, because they are not in any bridge - they are merely the pipes that serve the user ports that are in one or multiple bridges. The struct dsa_bridge associated with each FDB/MDB entry is encapsulated in a larger "struct dsa_db" database. Although only databases associated to bridges are notified for now, this API will be the starting point for implementing IFF_UNICAST_FLT in DSA. There, the idea is to install FDB entries on the CPU port which belong to the corresponding user port's port database. These are supposed to match only when the port is standalone. It is better to introduce the API in its expected final form than to introduce it for bridges first, then to have to change drivers which may have made one or more assumptions. Drivers can use the provided bridge.num, but they can also use a different numbering scheme that is more convenient. DSA must perform refcounting on the CPU and DSA ports by also taking into account the bridge number. So if two bridges request the same local address, DSA must notify the driver twice, once for each bridge. In fact, if the driver supports FDB isolation, DSA must perform refcounting per bridge, but if the driver doesn't, DSA must refcount host addresses across all bridges, otherwise it would be telling the driver to delete an FDB entry for a bridge and the driver would delete it for all bridges. So introduce a bool fdb_isolation in drivers which would make all bridge databases passed to the cross-chip notifier have the same number (0). This makes dsa_mac_addr_find() -> dsa_db_equal() say that all bridge databases are the same database - which is essentially the legacy behavior. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:22 +08:00
const unsigned char *addr, u16 vid,
struct dsa_db db)
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
{
net: mscc: ocelot: enforce FDB isolation when VLAN-unaware Currently ocelot uses a pvid of 0 for standalone ports and ports under a VLAN-unaware bridge, and the pvid of the bridge for ports under a VLAN-aware bridge. Standalone ports do not perform learning, but packets received on them are still subject to FDB lookups. So if the MAC DA that a standalone port receives has been also learned on a VLAN-unaware bridge port, ocelot will attempt to forward to that port, even though it can't, so it will drop packets. So there is a desire to avoid that, and isolate the FDBs of different bridges from one another, and from standalone ports. The ocelot switch library has two distinct entry points: the felix DSA driver and the ocelot switchdev driver. We need to code up a minimal bridge_num allocation in the ocelot switchdev driver too, this is copied from DSA with the exception that ocelot does not care about DSA trees, cross-chip bridging etc. So it only looks at its own ports that are already in the same bridge. The ocelot switchdev driver uses the bridge_num it has allocated itself, while the felix driver uses the bridge_num allocated by DSA. They are both stored inside ocelot_port->bridge_num by the common function ocelot_port_bridge_join() which receives the bridge_num passed by value. Once we have a bridge_num, we can only use it to enforce isolation between VLAN-unaware bridges. As far as I can see, ocelot does not have anything like a FID that further makes VLAN 100 from a port be different to VLAN 100 from another port with regard to FDB lookup. So we simply deny multiple VLAN-aware bridges. For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we allocate a VLAN for each bridge_num. This will be used as the pvid of each port that is under that VLAN-unaware bridge, for as long as that bridge is VLAN-unaware. VID 0 remains only for standalone ports. It is okay if all standalone ports use the same VID 0, since they perform no address learning, the FDB will contain no entry in VLAN 0, so the packets will always be flooded to the only possible destination, the CPU port. The CPU port module doesn't need to be member of the VLANs to receive packets, but if we use the DSA tag_8021q protocol, those packets are part of the data plane as far as ocelot is concerned, so there it needs to. Just ensure that the DSA tag_8021q CPU port is a member of all reserved VLANs when it is created, and is removed when it is deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:25 +08:00
struct net_device *bridge_dev = felix_classify_db(db);
struct dsa_port *dp = dsa_to_port(ds, port);
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
struct ocelot *ocelot = ds->priv;
net: mscc: ocelot: enforce FDB isolation when VLAN-unaware Currently ocelot uses a pvid of 0 for standalone ports and ports under a VLAN-unaware bridge, and the pvid of the bridge for ports under a VLAN-aware bridge. Standalone ports do not perform learning, but packets received on them are still subject to FDB lookups. So if the MAC DA that a standalone port receives has been also learned on a VLAN-unaware bridge port, ocelot will attempt to forward to that port, even though it can't, so it will drop packets. So there is a desire to avoid that, and isolate the FDBs of different bridges from one another, and from standalone ports. The ocelot switch library has two distinct entry points: the felix DSA driver and the ocelot switchdev driver. We need to code up a minimal bridge_num allocation in the ocelot switchdev driver too, this is copied from DSA with the exception that ocelot does not care about DSA trees, cross-chip bridging etc. So it only looks at its own ports that are already in the same bridge. The ocelot switchdev driver uses the bridge_num it has allocated itself, while the felix driver uses the bridge_num allocated by DSA. They are both stored inside ocelot_port->bridge_num by the common function ocelot_port_bridge_join() which receives the bridge_num passed by value. Once we have a bridge_num, we can only use it to enforce isolation between VLAN-unaware bridges. As far as I can see, ocelot does not have anything like a FID that further makes VLAN 100 from a port be different to VLAN 100 from another port with regard to FDB lookup. So we simply deny multiple VLAN-aware bridges. For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we allocate a VLAN for each bridge_num. This will be used as the pvid of each port that is under that VLAN-unaware bridge, for as long as that bridge is VLAN-unaware. VID 0 remains only for standalone ports. It is okay if all standalone ports use the same VID 0, since they perform no address learning, the FDB will contain no entry in VLAN 0, so the packets will always be flooded to the only possible destination, the CPU port. The CPU port module doesn't need to be member of the VLANs to receive packets, but if we use the DSA tag_8021q protocol, those packets are part of the data plane as far as ocelot is concerned, so there it needs to. Just ensure that the DSA tag_8021q CPU port is a member of all reserved VLANs when it is created, and is removed when it is deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:25 +08:00
if (IS_ERR(bridge_dev))
return PTR_ERR(bridge_dev);
if (dsa_port_is_cpu(dp) && !bridge_dev &&
net: dsa: felix: avoid early deletion of host FDB entries The Felix driver declares FDB isolation but puts all standalone ports in VID 0. This is mostly problem-free as discussed with Alvin here: https://patchwork.kernel.org/project/netdevbpf/cover/20220302191417.1288145-1-vladimir.oltean@nxp.com/#24763870 however there is one catch. DSA still thinks that FDB entries are installed on the CPU port as many times as there are user ports, and this is problematic when multiple user ports share the same MAC address. Consider the default case where all user ports inherit their MAC address from the DSA master, and then the user runs: ip link set swp0 address 00:01:02:03:04:05 The above will make dsa_slave_set_mac_address() call dsa_port_standalone_host_fdb_add() for 00:01:02:03:04:05 in port 0's standalone database, and dsa_port_standalone_host_fdb_del() for the old address of swp0, again in swp0's standalone database. Both the ->port_fdb_add() and ->port_fdb_del() will be propagated down to the felix driver, which will end up deleting the old MAC address from the CPU port. But this is still in use by other user ports, so we end up breaking unicast termination for them. There isn't a problem in the fact that DSA keeps track of host standalone addresses in the individual database of each user port: some drivers like sja1105 need this. There also isn't a problem in the fact that some drivers choose the same VID/FID for all standalone ports. It is just that the deletion of these host addresses must be delayed until they are known to not be in use any longer, and only the driver has this knowledge. Since DSA keeps these addresses in &cpu_dp->fdbs and &cpu_db->mdbs, it is just a matter of walking over those lists and see whether the same MAC address is present on the CPU port in the port db of another user port. I have considered reusing the generic dsa_port_walk_fdbs() and dsa_port_walk_mdbs() schemes for this, but locking makes it difficult. In the ->port_fdb_add() method and co, &dp->addr_lists_lock is held, but dsa_port_walk_fdbs() also acquires that lock. Also, even assuming that we introduce an unlocked variant of the address iterator, we'd still need some relatively complex data structures, and a void *ctx in the dsa_fdb_walk_cb_t which we don't currently pass, such that drivers are able to figure out, after iterating, whether the same MAC address is or isn't present in the port db of another port. All the above, plus the fact that I expect other drivers to follow the same model as felix where all standalone ports use the same FID, made me conclude that a generic method provided by DSA is necessary: dsa_fdb_present_in_other_db() and the mdb equivalent. Felix calls this from the ->port_fdb_del() handler for the CPU port, when the database was classified to either a port db, or a LAG db. For symmetry, we also call this from ->port_fdb_add(), because if the address was installed once, then installing it a second time serves no purpose: it's already in hardware in VID 0 and it affects all standalone ports. This change moves dsa_db_equal() from switch.c to dsa.c, since it now has one more caller. Fixes: 54c319846086 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-08 17:15:15 +08:00
dsa_fdb_present_in_other_db(ds, port, addr, vid, db))
return 0;
if (dsa_port_is_cpu(dp))
port = PGID_CPU;
net: mscc: ocelot: enforce FDB isolation when VLAN-unaware Currently ocelot uses a pvid of 0 for standalone ports and ports under a VLAN-unaware bridge, and the pvid of the bridge for ports under a VLAN-aware bridge. Standalone ports do not perform learning, but packets received on them are still subject to FDB lookups. So if the MAC DA that a standalone port receives has been also learned on a VLAN-unaware bridge port, ocelot will attempt to forward to that port, even though it can't, so it will drop packets. So there is a desire to avoid that, and isolate the FDBs of different bridges from one another, and from standalone ports. The ocelot switch library has two distinct entry points: the felix DSA driver and the ocelot switchdev driver. We need to code up a minimal bridge_num allocation in the ocelot switchdev driver too, this is copied from DSA with the exception that ocelot does not care about DSA trees, cross-chip bridging etc. So it only looks at its own ports that are already in the same bridge. The ocelot switchdev driver uses the bridge_num it has allocated itself, while the felix driver uses the bridge_num allocated by DSA. They are both stored inside ocelot_port->bridge_num by the common function ocelot_port_bridge_join() which receives the bridge_num passed by value. Once we have a bridge_num, we can only use it to enforce isolation between VLAN-unaware bridges. As far as I can see, ocelot does not have anything like a FID that further makes VLAN 100 from a port be different to VLAN 100 from another port with regard to FDB lookup. So we simply deny multiple VLAN-aware bridges. For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we allocate a VLAN for each bridge_num. This will be used as the pvid of each port that is under that VLAN-unaware bridge, for as long as that bridge is VLAN-unaware. VID 0 remains only for standalone ports. It is okay if all standalone ports use the same VID 0, since they perform no address learning, the FDB will contain no entry in VLAN 0, so the packets will always be flooded to the only possible destination, the CPU port. The CPU port module doesn't need to be member of the VLANs to receive packets, but if we use the DSA tag_8021q protocol, those packets are part of the data plane as far as ocelot is concerned, so there it needs to. Just ensure that the DSA tag_8021q CPU port is a member of all reserved VLANs when it is created, and is removed when it is deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:25 +08:00
return ocelot_fdb_add(ocelot, port, addr, vid, bridge_dev);
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
}
static int felix_fdb_del(struct dsa_switch *ds, int port,
net: dsa: request drivers to perform FDB isolation For DSA, to encourage drivers to perform FDB isolation simply means to track which bridge does each FDB and MDB entry belong to. It then becomes the driver responsibility to use something that makes the FDB entry from one bridge not match the FDB lookup of ports from other bridges. The top-level functions where the bridge is determined are: - dsa_port_fdb_{add,del} - dsa_port_host_fdb_{add,del} - dsa_port_mdb_{add,del} - dsa_port_host_mdb_{add,del} aka the pre-crosschip-notifier functions. Changing the API to pass a reference to a bridge is not superfluous, and looking at the passed bridge argument is not the same as having the driver look at dsa_to_port(ds, port)->bridge from the ->port_fdb_add() method. DSA installs FDB and MDB entries on shared (CPU and DSA) ports as well, and those do not have any dp->bridge information to retrieve, because they are not in any bridge - they are merely the pipes that serve the user ports that are in one or multiple bridges. The struct dsa_bridge associated with each FDB/MDB entry is encapsulated in a larger "struct dsa_db" database. Although only databases associated to bridges are notified for now, this API will be the starting point for implementing IFF_UNICAST_FLT in DSA. There, the idea is to install FDB entries on the CPU port which belong to the corresponding user port's port database. These are supposed to match only when the port is standalone. It is better to introduce the API in its expected final form than to introduce it for bridges first, then to have to change drivers which may have made one or more assumptions. Drivers can use the provided bridge.num, but they can also use a different numbering scheme that is more convenient. DSA must perform refcounting on the CPU and DSA ports by also taking into account the bridge number. So if two bridges request the same local address, DSA must notify the driver twice, once for each bridge. In fact, if the driver supports FDB isolation, DSA must perform refcounting per bridge, but if the driver doesn't, DSA must refcount host addresses across all bridges, otherwise it would be telling the driver to delete an FDB entry for a bridge and the driver would delete it for all bridges. So introduce a bool fdb_isolation in drivers which would make all bridge databases passed to the cross-chip notifier have the same number (0). This makes dsa_mac_addr_find() -> dsa_db_equal() say that all bridge databases are the same database - which is essentially the legacy behavior. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:22 +08:00
const unsigned char *addr, u16 vid,
struct dsa_db db)
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
{
net: mscc: ocelot: enforce FDB isolation when VLAN-unaware Currently ocelot uses a pvid of 0 for standalone ports and ports under a VLAN-unaware bridge, and the pvid of the bridge for ports under a VLAN-aware bridge. Standalone ports do not perform learning, but packets received on them are still subject to FDB lookups. So if the MAC DA that a standalone port receives has been also learned on a VLAN-unaware bridge port, ocelot will attempt to forward to that port, even though it can't, so it will drop packets. So there is a desire to avoid that, and isolate the FDBs of different bridges from one another, and from standalone ports. The ocelot switch library has two distinct entry points: the felix DSA driver and the ocelot switchdev driver. We need to code up a minimal bridge_num allocation in the ocelot switchdev driver too, this is copied from DSA with the exception that ocelot does not care about DSA trees, cross-chip bridging etc. So it only looks at its own ports that are already in the same bridge. The ocelot switchdev driver uses the bridge_num it has allocated itself, while the felix driver uses the bridge_num allocated by DSA. They are both stored inside ocelot_port->bridge_num by the common function ocelot_port_bridge_join() which receives the bridge_num passed by value. Once we have a bridge_num, we can only use it to enforce isolation between VLAN-unaware bridges. As far as I can see, ocelot does not have anything like a FID that further makes VLAN 100 from a port be different to VLAN 100 from another port with regard to FDB lookup. So we simply deny multiple VLAN-aware bridges. For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we allocate a VLAN for each bridge_num. This will be used as the pvid of each port that is under that VLAN-unaware bridge, for as long as that bridge is VLAN-unaware. VID 0 remains only for standalone ports. It is okay if all standalone ports use the same VID 0, since they perform no address learning, the FDB will contain no entry in VLAN 0, so the packets will always be flooded to the only possible destination, the CPU port. The CPU port module doesn't need to be member of the VLANs to receive packets, but if we use the DSA tag_8021q protocol, those packets are part of the data plane as far as ocelot is concerned, so there it needs to. Just ensure that the DSA tag_8021q CPU port is a member of all reserved VLANs when it is created, and is removed when it is deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:25 +08:00
struct net_device *bridge_dev = felix_classify_db(db);
struct dsa_port *dp = dsa_to_port(ds, port);
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
struct ocelot *ocelot = ds->priv;
net: mscc: ocelot: enforce FDB isolation when VLAN-unaware Currently ocelot uses a pvid of 0 for standalone ports and ports under a VLAN-unaware bridge, and the pvid of the bridge for ports under a VLAN-aware bridge. Standalone ports do not perform learning, but packets received on them are still subject to FDB lookups. So if the MAC DA that a standalone port receives has been also learned on a VLAN-unaware bridge port, ocelot will attempt to forward to that port, even though it can't, so it will drop packets. So there is a desire to avoid that, and isolate the FDBs of different bridges from one another, and from standalone ports. The ocelot switch library has two distinct entry points: the felix DSA driver and the ocelot switchdev driver. We need to code up a minimal bridge_num allocation in the ocelot switchdev driver too, this is copied from DSA with the exception that ocelot does not care about DSA trees, cross-chip bridging etc. So it only looks at its own ports that are already in the same bridge. The ocelot switchdev driver uses the bridge_num it has allocated itself, while the felix driver uses the bridge_num allocated by DSA. They are both stored inside ocelot_port->bridge_num by the common function ocelot_port_bridge_join() which receives the bridge_num passed by value. Once we have a bridge_num, we can only use it to enforce isolation between VLAN-unaware bridges. As far as I can see, ocelot does not have anything like a FID that further makes VLAN 100 from a port be different to VLAN 100 from another port with regard to FDB lookup. So we simply deny multiple VLAN-aware bridges. For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we allocate a VLAN for each bridge_num. This will be used as the pvid of each port that is under that VLAN-unaware bridge, for as long as that bridge is VLAN-unaware. VID 0 remains only for standalone ports. It is okay if all standalone ports use the same VID 0, since they perform no address learning, the FDB will contain no entry in VLAN 0, so the packets will always be flooded to the only possible destination, the CPU port. The CPU port module doesn't need to be member of the VLANs to receive packets, but if we use the DSA tag_8021q protocol, those packets are part of the data plane as far as ocelot is concerned, so there it needs to. Just ensure that the DSA tag_8021q CPU port is a member of all reserved VLANs when it is created, and is removed when it is deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:25 +08:00
if (IS_ERR(bridge_dev))
return PTR_ERR(bridge_dev);
if (dsa_port_is_cpu(dp) && !bridge_dev &&
net: dsa: felix: avoid early deletion of host FDB entries The Felix driver declares FDB isolation but puts all standalone ports in VID 0. This is mostly problem-free as discussed with Alvin here: https://patchwork.kernel.org/project/netdevbpf/cover/20220302191417.1288145-1-vladimir.oltean@nxp.com/#24763870 however there is one catch. DSA still thinks that FDB entries are installed on the CPU port as many times as there are user ports, and this is problematic when multiple user ports share the same MAC address. Consider the default case where all user ports inherit their MAC address from the DSA master, and then the user runs: ip link set swp0 address 00:01:02:03:04:05 The above will make dsa_slave_set_mac_address() call dsa_port_standalone_host_fdb_add() for 00:01:02:03:04:05 in port 0's standalone database, and dsa_port_standalone_host_fdb_del() for the old address of swp0, again in swp0's standalone database. Both the ->port_fdb_add() and ->port_fdb_del() will be propagated down to the felix driver, which will end up deleting the old MAC address from the CPU port. But this is still in use by other user ports, so we end up breaking unicast termination for them. There isn't a problem in the fact that DSA keeps track of host standalone addresses in the individual database of each user port: some drivers like sja1105 need this. There also isn't a problem in the fact that some drivers choose the same VID/FID for all standalone ports. It is just that the deletion of these host addresses must be delayed until they are known to not be in use any longer, and only the driver has this knowledge. Since DSA keeps these addresses in &cpu_dp->fdbs and &cpu_db->mdbs, it is just a matter of walking over those lists and see whether the same MAC address is present on the CPU port in the port db of another user port. I have considered reusing the generic dsa_port_walk_fdbs() and dsa_port_walk_mdbs() schemes for this, but locking makes it difficult. In the ->port_fdb_add() method and co, &dp->addr_lists_lock is held, but dsa_port_walk_fdbs() also acquires that lock. Also, even assuming that we introduce an unlocked variant of the address iterator, we'd still need some relatively complex data structures, and a void *ctx in the dsa_fdb_walk_cb_t which we don't currently pass, such that drivers are able to figure out, after iterating, whether the same MAC address is or isn't present in the port db of another port. All the above, plus the fact that I expect other drivers to follow the same model as felix where all standalone ports use the same FID, made me conclude that a generic method provided by DSA is necessary: dsa_fdb_present_in_other_db() and the mdb equivalent. Felix calls this from the ->port_fdb_del() handler for the CPU port, when the database was classified to either a port db, or a LAG db. For symmetry, we also call this from ->port_fdb_add(), because if the address was installed once, then installing it a second time serves no purpose: it's already in hardware in VID 0 and it affects all standalone ports. This change moves dsa_db_equal() from switch.c to dsa.c, since it now has one more caller. Fixes: 54c319846086 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-08 17:15:15 +08:00
dsa_fdb_present_in_other_db(ds, port, addr, vid, db))
return 0;
if (dsa_port_is_cpu(dp))
port = PGID_CPU;
net: mscc: ocelot: enforce FDB isolation when VLAN-unaware Currently ocelot uses a pvid of 0 for standalone ports and ports under a VLAN-unaware bridge, and the pvid of the bridge for ports under a VLAN-aware bridge. Standalone ports do not perform learning, but packets received on them are still subject to FDB lookups. So if the MAC DA that a standalone port receives has been also learned on a VLAN-unaware bridge port, ocelot will attempt to forward to that port, even though it can't, so it will drop packets. So there is a desire to avoid that, and isolate the FDBs of different bridges from one another, and from standalone ports. The ocelot switch library has two distinct entry points: the felix DSA driver and the ocelot switchdev driver. We need to code up a minimal bridge_num allocation in the ocelot switchdev driver too, this is copied from DSA with the exception that ocelot does not care about DSA trees, cross-chip bridging etc. So it only looks at its own ports that are already in the same bridge. The ocelot switchdev driver uses the bridge_num it has allocated itself, while the felix driver uses the bridge_num allocated by DSA. They are both stored inside ocelot_port->bridge_num by the common function ocelot_port_bridge_join() which receives the bridge_num passed by value. Once we have a bridge_num, we can only use it to enforce isolation between VLAN-unaware bridges. As far as I can see, ocelot does not have anything like a FID that further makes VLAN 100 from a port be different to VLAN 100 from another port with regard to FDB lookup. So we simply deny multiple VLAN-aware bridges. For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we allocate a VLAN for each bridge_num. This will be used as the pvid of each port that is under that VLAN-unaware bridge, for as long as that bridge is VLAN-unaware. VID 0 remains only for standalone ports. It is okay if all standalone ports use the same VID 0, since they perform no address learning, the FDB will contain no entry in VLAN 0, so the packets will always be flooded to the only possible destination, the CPU port. The CPU port module doesn't need to be member of the VLANs to receive packets, but if we use the DSA tag_8021q protocol, those packets are part of the data plane as far as ocelot is concerned, so there it needs to. Just ensure that the DSA tag_8021q CPU port is a member of all reserved VLANs when it is created, and is removed when it is deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:25 +08:00
return ocelot_fdb_del(ocelot, port, addr, vid, bridge_dev);
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
}
static int felix_lag_fdb_add(struct dsa_switch *ds, struct dsa_lag lag,
net: dsa: request drivers to perform FDB isolation For DSA, to encourage drivers to perform FDB isolation simply means to track which bridge does each FDB and MDB entry belong to. It then becomes the driver responsibility to use something that makes the FDB entry from one bridge not match the FDB lookup of ports from other bridges. The top-level functions where the bridge is determined are: - dsa_port_fdb_{add,del} - dsa_port_host_fdb_{add,del} - dsa_port_mdb_{add,del} - dsa_port_host_mdb_{add,del} aka the pre-crosschip-notifier functions. Changing the API to pass a reference to a bridge is not superfluous, and looking at the passed bridge argument is not the same as having the driver look at dsa_to_port(ds, port)->bridge from the ->port_fdb_add() method. DSA installs FDB and MDB entries on shared (CPU and DSA) ports as well, and those do not have any dp->bridge information to retrieve, because they are not in any bridge - they are merely the pipes that serve the user ports that are in one or multiple bridges. The struct dsa_bridge associated with each FDB/MDB entry is encapsulated in a larger "struct dsa_db" database. Although only databases associated to bridges are notified for now, this API will be the starting point for implementing IFF_UNICAST_FLT in DSA. There, the idea is to install FDB entries on the CPU port which belong to the corresponding user port's port database. These are supposed to match only when the port is standalone. It is better to introduce the API in its expected final form than to introduce it for bridges first, then to have to change drivers which may have made one or more assumptions. Drivers can use the provided bridge.num, but they can also use a different numbering scheme that is more convenient. DSA must perform refcounting on the CPU and DSA ports by also taking into account the bridge number. So if two bridges request the same local address, DSA must notify the driver twice, once for each bridge. In fact, if the driver supports FDB isolation, DSA must perform refcounting per bridge, but if the driver doesn't, DSA must refcount host addresses across all bridges, otherwise it would be telling the driver to delete an FDB entry for a bridge and the driver would delete it for all bridges. So introduce a bool fdb_isolation in drivers which would make all bridge databases passed to the cross-chip notifier have the same number (0). This makes dsa_mac_addr_find() -> dsa_db_equal() say that all bridge databases are the same database - which is essentially the legacy behavior. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:22 +08:00
const unsigned char *addr, u16 vid,
struct dsa_db db)
{
net: mscc: ocelot: enforce FDB isolation when VLAN-unaware Currently ocelot uses a pvid of 0 for standalone ports and ports under a VLAN-unaware bridge, and the pvid of the bridge for ports under a VLAN-aware bridge. Standalone ports do not perform learning, but packets received on them are still subject to FDB lookups. So if the MAC DA that a standalone port receives has been also learned on a VLAN-unaware bridge port, ocelot will attempt to forward to that port, even though it can't, so it will drop packets. So there is a desire to avoid that, and isolate the FDBs of different bridges from one another, and from standalone ports. The ocelot switch library has two distinct entry points: the felix DSA driver and the ocelot switchdev driver. We need to code up a minimal bridge_num allocation in the ocelot switchdev driver too, this is copied from DSA with the exception that ocelot does not care about DSA trees, cross-chip bridging etc. So it only looks at its own ports that are already in the same bridge. The ocelot switchdev driver uses the bridge_num it has allocated itself, while the felix driver uses the bridge_num allocated by DSA. They are both stored inside ocelot_port->bridge_num by the common function ocelot_port_bridge_join() which receives the bridge_num passed by value. Once we have a bridge_num, we can only use it to enforce isolation between VLAN-unaware bridges. As far as I can see, ocelot does not have anything like a FID that further makes VLAN 100 from a port be different to VLAN 100 from another port with regard to FDB lookup. So we simply deny multiple VLAN-aware bridges. For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we allocate a VLAN for each bridge_num. This will be used as the pvid of each port that is under that VLAN-unaware bridge, for as long as that bridge is VLAN-unaware. VID 0 remains only for standalone ports. It is okay if all standalone ports use the same VID 0, since they perform no address learning, the FDB will contain no entry in VLAN 0, so the packets will always be flooded to the only possible destination, the CPU port. The CPU port module doesn't need to be member of the VLANs to receive packets, but if we use the DSA tag_8021q protocol, those packets are part of the data plane as far as ocelot is concerned, so there it needs to. Just ensure that the DSA tag_8021q CPU port is a member of all reserved VLANs when it is created, and is removed when it is deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:25 +08:00
struct net_device *bridge_dev = felix_classify_db(db);
struct ocelot *ocelot = ds->priv;
net: mscc: ocelot: enforce FDB isolation when VLAN-unaware Currently ocelot uses a pvid of 0 for standalone ports and ports under a VLAN-unaware bridge, and the pvid of the bridge for ports under a VLAN-aware bridge. Standalone ports do not perform learning, but packets received on them are still subject to FDB lookups. So if the MAC DA that a standalone port receives has been also learned on a VLAN-unaware bridge port, ocelot will attempt to forward to that port, even though it can't, so it will drop packets. So there is a desire to avoid that, and isolate the FDBs of different bridges from one another, and from standalone ports. The ocelot switch library has two distinct entry points: the felix DSA driver and the ocelot switchdev driver. We need to code up a minimal bridge_num allocation in the ocelot switchdev driver too, this is copied from DSA with the exception that ocelot does not care about DSA trees, cross-chip bridging etc. So it only looks at its own ports that are already in the same bridge. The ocelot switchdev driver uses the bridge_num it has allocated itself, while the felix driver uses the bridge_num allocated by DSA. They are both stored inside ocelot_port->bridge_num by the common function ocelot_port_bridge_join() which receives the bridge_num passed by value. Once we have a bridge_num, we can only use it to enforce isolation between VLAN-unaware bridges. As far as I can see, ocelot does not have anything like a FID that further makes VLAN 100 from a port be different to VLAN 100 from another port with regard to FDB lookup. So we simply deny multiple VLAN-aware bridges. For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we allocate a VLAN for each bridge_num. This will be used as the pvid of each port that is under that VLAN-unaware bridge, for as long as that bridge is VLAN-unaware. VID 0 remains only for standalone ports. It is okay if all standalone ports use the same VID 0, since they perform no address learning, the FDB will contain no entry in VLAN 0, so the packets will always be flooded to the only possible destination, the CPU port. The CPU port module doesn't need to be member of the VLANs to receive packets, but if we use the DSA tag_8021q protocol, those packets are part of the data plane as far as ocelot is concerned, so there it needs to. Just ensure that the DSA tag_8021q CPU port is a member of all reserved VLANs when it is created, and is removed when it is deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:25 +08:00
if (IS_ERR(bridge_dev))
return PTR_ERR(bridge_dev);
return ocelot_lag_fdb_add(ocelot, lag.dev, addr, vid, bridge_dev);
}
static int felix_lag_fdb_del(struct dsa_switch *ds, struct dsa_lag lag,
net: dsa: request drivers to perform FDB isolation For DSA, to encourage drivers to perform FDB isolation simply means to track which bridge does each FDB and MDB entry belong to. It then becomes the driver responsibility to use something that makes the FDB entry from one bridge not match the FDB lookup of ports from other bridges. The top-level functions where the bridge is determined are: - dsa_port_fdb_{add,del} - dsa_port_host_fdb_{add,del} - dsa_port_mdb_{add,del} - dsa_port_host_mdb_{add,del} aka the pre-crosschip-notifier functions. Changing the API to pass a reference to a bridge is not superfluous, and looking at the passed bridge argument is not the same as having the driver look at dsa_to_port(ds, port)->bridge from the ->port_fdb_add() method. DSA installs FDB and MDB entries on shared (CPU and DSA) ports as well, and those do not have any dp->bridge information to retrieve, because they are not in any bridge - they are merely the pipes that serve the user ports that are in one or multiple bridges. The struct dsa_bridge associated with each FDB/MDB entry is encapsulated in a larger "struct dsa_db" database. Although only databases associated to bridges are notified for now, this API will be the starting point for implementing IFF_UNICAST_FLT in DSA. There, the idea is to install FDB entries on the CPU port which belong to the corresponding user port's port database. These are supposed to match only when the port is standalone. It is better to introduce the API in its expected final form than to introduce it for bridges first, then to have to change drivers which may have made one or more assumptions. Drivers can use the provided bridge.num, but they can also use a different numbering scheme that is more convenient. DSA must perform refcounting on the CPU and DSA ports by also taking into account the bridge number. So if two bridges request the same local address, DSA must notify the driver twice, once for each bridge. In fact, if the driver supports FDB isolation, DSA must perform refcounting per bridge, but if the driver doesn't, DSA must refcount host addresses across all bridges, otherwise it would be telling the driver to delete an FDB entry for a bridge and the driver would delete it for all bridges. So introduce a bool fdb_isolation in drivers which would make all bridge databases passed to the cross-chip notifier have the same number (0). This makes dsa_mac_addr_find() -> dsa_db_equal() say that all bridge databases are the same database - which is essentially the legacy behavior. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:22 +08:00
const unsigned char *addr, u16 vid,
struct dsa_db db)
{
net: mscc: ocelot: enforce FDB isolation when VLAN-unaware Currently ocelot uses a pvid of 0 for standalone ports and ports under a VLAN-unaware bridge, and the pvid of the bridge for ports under a VLAN-aware bridge. Standalone ports do not perform learning, but packets received on them are still subject to FDB lookups. So if the MAC DA that a standalone port receives has been also learned on a VLAN-unaware bridge port, ocelot will attempt to forward to that port, even though it can't, so it will drop packets. So there is a desire to avoid that, and isolate the FDBs of different bridges from one another, and from standalone ports. The ocelot switch library has two distinct entry points: the felix DSA driver and the ocelot switchdev driver. We need to code up a minimal bridge_num allocation in the ocelot switchdev driver too, this is copied from DSA with the exception that ocelot does not care about DSA trees, cross-chip bridging etc. So it only looks at its own ports that are already in the same bridge. The ocelot switchdev driver uses the bridge_num it has allocated itself, while the felix driver uses the bridge_num allocated by DSA. They are both stored inside ocelot_port->bridge_num by the common function ocelot_port_bridge_join() which receives the bridge_num passed by value. Once we have a bridge_num, we can only use it to enforce isolation between VLAN-unaware bridges. As far as I can see, ocelot does not have anything like a FID that further makes VLAN 100 from a port be different to VLAN 100 from another port with regard to FDB lookup. So we simply deny multiple VLAN-aware bridges. For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we allocate a VLAN for each bridge_num. This will be used as the pvid of each port that is under that VLAN-unaware bridge, for as long as that bridge is VLAN-unaware. VID 0 remains only for standalone ports. It is okay if all standalone ports use the same VID 0, since they perform no address learning, the FDB will contain no entry in VLAN 0, so the packets will always be flooded to the only possible destination, the CPU port. The CPU port module doesn't need to be member of the VLANs to receive packets, but if we use the DSA tag_8021q protocol, those packets are part of the data plane as far as ocelot is concerned, so there it needs to. Just ensure that the DSA tag_8021q CPU port is a member of all reserved VLANs when it is created, and is removed when it is deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:25 +08:00
struct net_device *bridge_dev = felix_classify_db(db);
struct ocelot *ocelot = ds->priv;
net: mscc: ocelot: enforce FDB isolation when VLAN-unaware Currently ocelot uses a pvid of 0 for standalone ports and ports under a VLAN-unaware bridge, and the pvid of the bridge for ports under a VLAN-aware bridge. Standalone ports do not perform learning, but packets received on them are still subject to FDB lookups. So if the MAC DA that a standalone port receives has been also learned on a VLAN-unaware bridge port, ocelot will attempt to forward to that port, even though it can't, so it will drop packets. So there is a desire to avoid that, and isolate the FDBs of different bridges from one another, and from standalone ports. The ocelot switch library has two distinct entry points: the felix DSA driver and the ocelot switchdev driver. We need to code up a minimal bridge_num allocation in the ocelot switchdev driver too, this is copied from DSA with the exception that ocelot does not care about DSA trees, cross-chip bridging etc. So it only looks at its own ports that are already in the same bridge. The ocelot switchdev driver uses the bridge_num it has allocated itself, while the felix driver uses the bridge_num allocated by DSA. They are both stored inside ocelot_port->bridge_num by the common function ocelot_port_bridge_join() which receives the bridge_num passed by value. Once we have a bridge_num, we can only use it to enforce isolation between VLAN-unaware bridges. As far as I can see, ocelot does not have anything like a FID that further makes VLAN 100 from a port be different to VLAN 100 from another port with regard to FDB lookup. So we simply deny multiple VLAN-aware bridges. For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we allocate a VLAN for each bridge_num. This will be used as the pvid of each port that is under that VLAN-unaware bridge, for as long as that bridge is VLAN-unaware. VID 0 remains only for standalone ports. It is okay if all standalone ports use the same VID 0, since they perform no address learning, the FDB will contain no entry in VLAN 0, so the packets will always be flooded to the only possible destination, the CPU port. The CPU port module doesn't need to be member of the VLANs to receive packets, but if we use the DSA tag_8021q protocol, those packets are part of the data plane as far as ocelot is concerned, so there it needs to. Just ensure that the DSA tag_8021q CPU port is a member of all reserved VLANs when it is created, and is removed when it is deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:25 +08:00
if (IS_ERR(bridge_dev))
return PTR_ERR(bridge_dev);
return ocelot_lag_fdb_del(ocelot, lag.dev, addr, vid, bridge_dev);
}
static int felix_mdb_add(struct dsa_switch *ds, int port,
net: dsa: request drivers to perform FDB isolation For DSA, to encourage drivers to perform FDB isolation simply means to track which bridge does each FDB and MDB entry belong to. It then becomes the driver responsibility to use something that makes the FDB entry from one bridge not match the FDB lookup of ports from other bridges. The top-level functions where the bridge is determined are: - dsa_port_fdb_{add,del} - dsa_port_host_fdb_{add,del} - dsa_port_mdb_{add,del} - dsa_port_host_mdb_{add,del} aka the pre-crosschip-notifier functions. Changing the API to pass a reference to a bridge is not superfluous, and looking at the passed bridge argument is not the same as having the driver look at dsa_to_port(ds, port)->bridge from the ->port_fdb_add() method. DSA installs FDB and MDB entries on shared (CPU and DSA) ports as well, and those do not have any dp->bridge information to retrieve, because they are not in any bridge - they are merely the pipes that serve the user ports that are in one or multiple bridges. The struct dsa_bridge associated with each FDB/MDB entry is encapsulated in a larger "struct dsa_db" database. Although only databases associated to bridges are notified for now, this API will be the starting point for implementing IFF_UNICAST_FLT in DSA. There, the idea is to install FDB entries on the CPU port which belong to the corresponding user port's port database. These are supposed to match only when the port is standalone. It is better to introduce the API in its expected final form than to introduce it for bridges first, then to have to change drivers which may have made one or more assumptions. Drivers can use the provided bridge.num, but they can also use a different numbering scheme that is more convenient. DSA must perform refcounting on the CPU and DSA ports by also taking into account the bridge number. So if two bridges request the same local address, DSA must notify the driver twice, once for each bridge. In fact, if the driver supports FDB isolation, DSA must perform refcounting per bridge, but if the driver doesn't, DSA must refcount host addresses across all bridges, otherwise it would be telling the driver to delete an FDB entry for a bridge and the driver would delete it for all bridges. So introduce a bool fdb_isolation in drivers which would make all bridge databases passed to the cross-chip notifier have the same number (0). This makes dsa_mac_addr_find() -> dsa_db_equal() say that all bridge databases are the same database - which is essentially the legacy behavior. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:22 +08:00
const struct switchdev_obj_port_mdb *mdb,
struct dsa_db db)
{
net: mscc: ocelot: enforce FDB isolation when VLAN-unaware Currently ocelot uses a pvid of 0 for standalone ports and ports under a VLAN-unaware bridge, and the pvid of the bridge for ports under a VLAN-aware bridge. Standalone ports do not perform learning, but packets received on them are still subject to FDB lookups. So if the MAC DA that a standalone port receives has been also learned on a VLAN-unaware bridge port, ocelot will attempt to forward to that port, even though it can't, so it will drop packets. So there is a desire to avoid that, and isolate the FDBs of different bridges from one another, and from standalone ports. The ocelot switch library has two distinct entry points: the felix DSA driver and the ocelot switchdev driver. We need to code up a minimal bridge_num allocation in the ocelot switchdev driver too, this is copied from DSA with the exception that ocelot does not care about DSA trees, cross-chip bridging etc. So it only looks at its own ports that are already in the same bridge. The ocelot switchdev driver uses the bridge_num it has allocated itself, while the felix driver uses the bridge_num allocated by DSA. They are both stored inside ocelot_port->bridge_num by the common function ocelot_port_bridge_join() which receives the bridge_num passed by value. Once we have a bridge_num, we can only use it to enforce isolation between VLAN-unaware bridges. As far as I can see, ocelot does not have anything like a FID that further makes VLAN 100 from a port be different to VLAN 100 from another port with regard to FDB lookup. So we simply deny multiple VLAN-aware bridges. For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we allocate a VLAN for each bridge_num. This will be used as the pvid of each port that is under that VLAN-unaware bridge, for as long as that bridge is VLAN-unaware. VID 0 remains only for standalone ports. It is okay if all standalone ports use the same VID 0, since they perform no address learning, the FDB will contain no entry in VLAN 0, so the packets will always be flooded to the only possible destination, the CPU port. The CPU port module doesn't need to be member of the VLANs to receive packets, but if we use the DSA tag_8021q protocol, those packets are part of the data plane as far as ocelot is concerned, so there it needs to. Just ensure that the DSA tag_8021q CPU port is a member of all reserved VLANs when it is created, and is removed when it is deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:25 +08:00
struct net_device *bridge_dev = felix_classify_db(db);
struct ocelot *ocelot = ds->priv;
net: mscc: ocelot: enforce FDB isolation when VLAN-unaware Currently ocelot uses a pvid of 0 for standalone ports and ports under a VLAN-unaware bridge, and the pvid of the bridge for ports under a VLAN-aware bridge. Standalone ports do not perform learning, but packets received on them are still subject to FDB lookups. So if the MAC DA that a standalone port receives has been also learned on a VLAN-unaware bridge port, ocelot will attempt to forward to that port, even though it can't, so it will drop packets. So there is a desire to avoid that, and isolate the FDBs of different bridges from one another, and from standalone ports. The ocelot switch library has two distinct entry points: the felix DSA driver and the ocelot switchdev driver. We need to code up a minimal bridge_num allocation in the ocelot switchdev driver too, this is copied from DSA with the exception that ocelot does not care about DSA trees, cross-chip bridging etc. So it only looks at its own ports that are already in the same bridge. The ocelot switchdev driver uses the bridge_num it has allocated itself, while the felix driver uses the bridge_num allocated by DSA. They are both stored inside ocelot_port->bridge_num by the common function ocelot_port_bridge_join() which receives the bridge_num passed by value. Once we have a bridge_num, we can only use it to enforce isolation between VLAN-unaware bridges. As far as I can see, ocelot does not have anything like a FID that further makes VLAN 100 from a port be different to VLAN 100 from another port with regard to FDB lookup. So we simply deny multiple VLAN-aware bridges. For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we allocate a VLAN for each bridge_num. This will be used as the pvid of each port that is under that VLAN-unaware bridge, for as long as that bridge is VLAN-unaware. VID 0 remains only for standalone ports. It is okay if all standalone ports use the same VID 0, since they perform no address learning, the FDB will contain no entry in VLAN 0, so the packets will always be flooded to the only possible destination, the CPU port. The CPU port module doesn't need to be member of the VLANs to receive packets, but if we use the DSA tag_8021q protocol, those packets are part of the data plane as far as ocelot is concerned, so there it needs to. Just ensure that the DSA tag_8021q CPU port is a member of all reserved VLANs when it is created, and is removed when it is deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:25 +08:00
if (IS_ERR(bridge_dev))
return PTR_ERR(bridge_dev);
net: dsa: felix: avoid early deletion of host FDB entries The Felix driver declares FDB isolation but puts all standalone ports in VID 0. This is mostly problem-free as discussed with Alvin here: https://patchwork.kernel.org/project/netdevbpf/cover/20220302191417.1288145-1-vladimir.oltean@nxp.com/#24763870 however there is one catch. DSA still thinks that FDB entries are installed on the CPU port as many times as there are user ports, and this is problematic when multiple user ports share the same MAC address. Consider the default case where all user ports inherit their MAC address from the DSA master, and then the user runs: ip link set swp0 address 00:01:02:03:04:05 The above will make dsa_slave_set_mac_address() call dsa_port_standalone_host_fdb_add() for 00:01:02:03:04:05 in port 0's standalone database, and dsa_port_standalone_host_fdb_del() for the old address of swp0, again in swp0's standalone database. Both the ->port_fdb_add() and ->port_fdb_del() will be propagated down to the felix driver, which will end up deleting the old MAC address from the CPU port. But this is still in use by other user ports, so we end up breaking unicast termination for them. There isn't a problem in the fact that DSA keeps track of host standalone addresses in the individual database of each user port: some drivers like sja1105 need this. There also isn't a problem in the fact that some drivers choose the same VID/FID for all standalone ports. It is just that the deletion of these host addresses must be delayed until they are known to not be in use any longer, and only the driver has this knowledge. Since DSA keeps these addresses in &cpu_dp->fdbs and &cpu_db->mdbs, it is just a matter of walking over those lists and see whether the same MAC address is present on the CPU port in the port db of another user port. I have considered reusing the generic dsa_port_walk_fdbs() and dsa_port_walk_mdbs() schemes for this, but locking makes it difficult. In the ->port_fdb_add() method and co, &dp->addr_lists_lock is held, but dsa_port_walk_fdbs() also acquires that lock. Also, even assuming that we introduce an unlocked variant of the address iterator, we'd still need some relatively complex data structures, and a void *ctx in the dsa_fdb_walk_cb_t which we don't currently pass, such that drivers are able to figure out, after iterating, whether the same MAC address is or isn't present in the port db of another port. All the above, plus the fact that I expect other drivers to follow the same model as felix where all standalone ports use the same FID, made me conclude that a generic method provided by DSA is necessary: dsa_fdb_present_in_other_db() and the mdb equivalent. Felix calls this from the ->port_fdb_del() handler for the CPU port, when the database was classified to either a port db, or a LAG db. For symmetry, we also call this from ->port_fdb_add(), because if the address was installed once, then installing it a second time serves no purpose: it's already in hardware in VID 0 and it affects all standalone ports. This change moves dsa_db_equal() from switch.c to dsa.c, since it now has one more caller. Fixes: 54c319846086 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-08 17:15:15 +08:00
if (dsa_is_cpu_port(ds, port) && !bridge_dev &&
dsa_mdb_present_in_other_db(ds, port, mdb, db))
return 0;
if (port == ocelot->npi)
port = ocelot->num_phys_ports;
net: mscc: ocelot: enforce FDB isolation when VLAN-unaware Currently ocelot uses a pvid of 0 for standalone ports and ports under a VLAN-unaware bridge, and the pvid of the bridge for ports under a VLAN-aware bridge. Standalone ports do not perform learning, but packets received on them are still subject to FDB lookups. So if the MAC DA that a standalone port receives has been also learned on a VLAN-unaware bridge port, ocelot will attempt to forward to that port, even though it can't, so it will drop packets. So there is a desire to avoid that, and isolate the FDBs of different bridges from one another, and from standalone ports. The ocelot switch library has two distinct entry points: the felix DSA driver and the ocelot switchdev driver. We need to code up a minimal bridge_num allocation in the ocelot switchdev driver too, this is copied from DSA with the exception that ocelot does not care about DSA trees, cross-chip bridging etc. So it only looks at its own ports that are already in the same bridge. The ocelot switchdev driver uses the bridge_num it has allocated itself, while the felix driver uses the bridge_num allocated by DSA. They are both stored inside ocelot_port->bridge_num by the common function ocelot_port_bridge_join() which receives the bridge_num passed by value. Once we have a bridge_num, we can only use it to enforce isolation between VLAN-unaware bridges. As far as I can see, ocelot does not have anything like a FID that further makes VLAN 100 from a port be different to VLAN 100 from another port with regard to FDB lookup. So we simply deny multiple VLAN-aware bridges. For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we allocate a VLAN for each bridge_num. This will be used as the pvid of each port that is under that VLAN-unaware bridge, for as long as that bridge is VLAN-unaware. VID 0 remains only for standalone ports. It is okay if all standalone ports use the same VID 0, since they perform no address learning, the FDB will contain no entry in VLAN 0, so the packets will always be flooded to the only possible destination, the CPU port. The CPU port module doesn't need to be member of the VLANs to receive packets, but if we use the DSA tag_8021q protocol, those packets are part of the data plane as far as ocelot is concerned, so there it needs to. Just ensure that the DSA tag_8021q CPU port is a member of all reserved VLANs when it is created, and is removed when it is deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:25 +08:00
return ocelot_port_mdb_add(ocelot, port, mdb, bridge_dev);
}
static int felix_mdb_del(struct dsa_switch *ds, int port,
net: dsa: request drivers to perform FDB isolation For DSA, to encourage drivers to perform FDB isolation simply means to track which bridge does each FDB and MDB entry belong to. It then becomes the driver responsibility to use something that makes the FDB entry from one bridge not match the FDB lookup of ports from other bridges. The top-level functions where the bridge is determined are: - dsa_port_fdb_{add,del} - dsa_port_host_fdb_{add,del} - dsa_port_mdb_{add,del} - dsa_port_host_mdb_{add,del} aka the pre-crosschip-notifier functions. Changing the API to pass a reference to a bridge is not superfluous, and looking at the passed bridge argument is not the same as having the driver look at dsa_to_port(ds, port)->bridge from the ->port_fdb_add() method. DSA installs FDB and MDB entries on shared (CPU and DSA) ports as well, and those do not have any dp->bridge information to retrieve, because they are not in any bridge - they are merely the pipes that serve the user ports that are in one or multiple bridges. The struct dsa_bridge associated with each FDB/MDB entry is encapsulated in a larger "struct dsa_db" database. Although only databases associated to bridges are notified for now, this API will be the starting point for implementing IFF_UNICAST_FLT in DSA. There, the idea is to install FDB entries on the CPU port which belong to the corresponding user port's port database. These are supposed to match only when the port is standalone. It is better to introduce the API in its expected final form than to introduce it for bridges first, then to have to change drivers which may have made one or more assumptions. Drivers can use the provided bridge.num, but they can also use a different numbering scheme that is more convenient. DSA must perform refcounting on the CPU and DSA ports by also taking into account the bridge number. So if two bridges request the same local address, DSA must notify the driver twice, once for each bridge. In fact, if the driver supports FDB isolation, DSA must perform refcounting per bridge, but if the driver doesn't, DSA must refcount host addresses across all bridges, otherwise it would be telling the driver to delete an FDB entry for a bridge and the driver would delete it for all bridges. So introduce a bool fdb_isolation in drivers which would make all bridge databases passed to the cross-chip notifier have the same number (0). This makes dsa_mac_addr_find() -> dsa_db_equal() say that all bridge databases are the same database - which is essentially the legacy behavior. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:22 +08:00
const struct switchdev_obj_port_mdb *mdb,
struct dsa_db db)
{
net: mscc: ocelot: enforce FDB isolation when VLAN-unaware Currently ocelot uses a pvid of 0 for standalone ports and ports under a VLAN-unaware bridge, and the pvid of the bridge for ports under a VLAN-aware bridge. Standalone ports do not perform learning, but packets received on them are still subject to FDB lookups. So if the MAC DA that a standalone port receives has been also learned on a VLAN-unaware bridge port, ocelot will attempt to forward to that port, even though it can't, so it will drop packets. So there is a desire to avoid that, and isolate the FDBs of different bridges from one another, and from standalone ports. The ocelot switch library has two distinct entry points: the felix DSA driver and the ocelot switchdev driver. We need to code up a minimal bridge_num allocation in the ocelot switchdev driver too, this is copied from DSA with the exception that ocelot does not care about DSA trees, cross-chip bridging etc. So it only looks at its own ports that are already in the same bridge. The ocelot switchdev driver uses the bridge_num it has allocated itself, while the felix driver uses the bridge_num allocated by DSA. They are both stored inside ocelot_port->bridge_num by the common function ocelot_port_bridge_join() which receives the bridge_num passed by value. Once we have a bridge_num, we can only use it to enforce isolation between VLAN-unaware bridges. As far as I can see, ocelot does not have anything like a FID that further makes VLAN 100 from a port be different to VLAN 100 from another port with regard to FDB lookup. So we simply deny multiple VLAN-aware bridges. For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we allocate a VLAN for each bridge_num. This will be used as the pvid of each port that is under that VLAN-unaware bridge, for as long as that bridge is VLAN-unaware. VID 0 remains only for standalone ports. It is okay if all standalone ports use the same VID 0, since they perform no address learning, the FDB will contain no entry in VLAN 0, so the packets will always be flooded to the only possible destination, the CPU port. The CPU port module doesn't need to be member of the VLANs to receive packets, but if we use the DSA tag_8021q protocol, those packets are part of the data plane as far as ocelot is concerned, so there it needs to. Just ensure that the DSA tag_8021q CPU port is a member of all reserved VLANs when it is created, and is removed when it is deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:25 +08:00
struct net_device *bridge_dev = felix_classify_db(db);
struct ocelot *ocelot = ds->priv;
net: mscc: ocelot: enforce FDB isolation when VLAN-unaware Currently ocelot uses a pvid of 0 for standalone ports and ports under a VLAN-unaware bridge, and the pvid of the bridge for ports under a VLAN-aware bridge. Standalone ports do not perform learning, but packets received on them are still subject to FDB lookups. So if the MAC DA that a standalone port receives has been also learned on a VLAN-unaware bridge port, ocelot will attempt to forward to that port, even though it can't, so it will drop packets. So there is a desire to avoid that, and isolate the FDBs of different bridges from one another, and from standalone ports. The ocelot switch library has two distinct entry points: the felix DSA driver and the ocelot switchdev driver. We need to code up a minimal bridge_num allocation in the ocelot switchdev driver too, this is copied from DSA with the exception that ocelot does not care about DSA trees, cross-chip bridging etc. So it only looks at its own ports that are already in the same bridge. The ocelot switchdev driver uses the bridge_num it has allocated itself, while the felix driver uses the bridge_num allocated by DSA. They are both stored inside ocelot_port->bridge_num by the common function ocelot_port_bridge_join() which receives the bridge_num passed by value. Once we have a bridge_num, we can only use it to enforce isolation between VLAN-unaware bridges. As far as I can see, ocelot does not have anything like a FID that further makes VLAN 100 from a port be different to VLAN 100 from another port with regard to FDB lookup. So we simply deny multiple VLAN-aware bridges. For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we allocate a VLAN for each bridge_num. This will be used as the pvid of each port that is under that VLAN-unaware bridge, for as long as that bridge is VLAN-unaware. VID 0 remains only for standalone ports. It is okay if all standalone ports use the same VID 0, since they perform no address learning, the FDB will contain no entry in VLAN 0, so the packets will always be flooded to the only possible destination, the CPU port. The CPU port module doesn't need to be member of the VLANs to receive packets, but if we use the DSA tag_8021q protocol, those packets are part of the data plane as far as ocelot is concerned, so there it needs to. Just ensure that the DSA tag_8021q CPU port is a member of all reserved VLANs when it is created, and is removed when it is deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:25 +08:00
if (IS_ERR(bridge_dev))
return PTR_ERR(bridge_dev);
net: dsa: felix: avoid early deletion of host FDB entries The Felix driver declares FDB isolation but puts all standalone ports in VID 0. This is mostly problem-free as discussed with Alvin here: https://patchwork.kernel.org/project/netdevbpf/cover/20220302191417.1288145-1-vladimir.oltean@nxp.com/#24763870 however there is one catch. DSA still thinks that FDB entries are installed on the CPU port as many times as there are user ports, and this is problematic when multiple user ports share the same MAC address. Consider the default case where all user ports inherit their MAC address from the DSA master, and then the user runs: ip link set swp0 address 00:01:02:03:04:05 The above will make dsa_slave_set_mac_address() call dsa_port_standalone_host_fdb_add() for 00:01:02:03:04:05 in port 0's standalone database, and dsa_port_standalone_host_fdb_del() for the old address of swp0, again in swp0's standalone database. Both the ->port_fdb_add() and ->port_fdb_del() will be propagated down to the felix driver, which will end up deleting the old MAC address from the CPU port. But this is still in use by other user ports, so we end up breaking unicast termination for them. There isn't a problem in the fact that DSA keeps track of host standalone addresses in the individual database of each user port: some drivers like sja1105 need this. There also isn't a problem in the fact that some drivers choose the same VID/FID for all standalone ports. It is just that the deletion of these host addresses must be delayed until they are known to not be in use any longer, and only the driver has this knowledge. Since DSA keeps these addresses in &cpu_dp->fdbs and &cpu_db->mdbs, it is just a matter of walking over those lists and see whether the same MAC address is present on the CPU port in the port db of another user port. I have considered reusing the generic dsa_port_walk_fdbs() and dsa_port_walk_mdbs() schemes for this, but locking makes it difficult. In the ->port_fdb_add() method and co, &dp->addr_lists_lock is held, but dsa_port_walk_fdbs() also acquires that lock. Also, even assuming that we introduce an unlocked variant of the address iterator, we'd still need some relatively complex data structures, and a void *ctx in the dsa_fdb_walk_cb_t which we don't currently pass, such that drivers are able to figure out, after iterating, whether the same MAC address is or isn't present in the port db of another port. All the above, plus the fact that I expect other drivers to follow the same model as felix where all standalone ports use the same FID, made me conclude that a generic method provided by DSA is necessary: dsa_fdb_present_in_other_db() and the mdb equivalent. Felix calls this from the ->port_fdb_del() handler for the CPU port, when the database was classified to either a port db, or a LAG db. For symmetry, we also call this from ->port_fdb_add(), because if the address was installed once, then installing it a second time serves no purpose: it's already in hardware in VID 0 and it affects all standalone ports. This change moves dsa_db_equal() from switch.c to dsa.c, since it now has one more caller. Fixes: 54c319846086 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-08 17:15:15 +08:00
if (dsa_is_cpu_port(ds, port) && !bridge_dev &&
dsa_mdb_present_in_other_db(ds, port, mdb, db))
return 0;
if (port == ocelot->npi)
port = ocelot->num_phys_ports;
net: mscc: ocelot: enforce FDB isolation when VLAN-unaware Currently ocelot uses a pvid of 0 for standalone ports and ports under a VLAN-unaware bridge, and the pvid of the bridge for ports under a VLAN-aware bridge. Standalone ports do not perform learning, but packets received on them are still subject to FDB lookups. So if the MAC DA that a standalone port receives has been also learned on a VLAN-unaware bridge port, ocelot will attempt to forward to that port, even though it can't, so it will drop packets. So there is a desire to avoid that, and isolate the FDBs of different bridges from one another, and from standalone ports. The ocelot switch library has two distinct entry points: the felix DSA driver and the ocelot switchdev driver. We need to code up a minimal bridge_num allocation in the ocelot switchdev driver too, this is copied from DSA with the exception that ocelot does not care about DSA trees, cross-chip bridging etc. So it only looks at its own ports that are already in the same bridge. The ocelot switchdev driver uses the bridge_num it has allocated itself, while the felix driver uses the bridge_num allocated by DSA. They are both stored inside ocelot_port->bridge_num by the common function ocelot_port_bridge_join() which receives the bridge_num passed by value. Once we have a bridge_num, we can only use it to enforce isolation between VLAN-unaware bridges. As far as I can see, ocelot does not have anything like a FID that further makes VLAN 100 from a port be different to VLAN 100 from another port with regard to FDB lookup. So we simply deny multiple VLAN-aware bridges. For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we allocate a VLAN for each bridge_num. This will be used as the pvid of each port that is under that VLAN-unaware bridge, for as long as that bridge is VLAN-unaware. VID 0 remains only for standalone ports. It is okay if all standalone ports use the same VID 0, since they perform no address learning, the FDB will contain no entry in VLAN 0, so the packets will always be flooded to the only possible destination, the CPU port. The CPU port module doesn't need to be member of the VLANs to receive packets, but if we use the DSA tag_8021q protocol, those packets are part of the data plane as far as ocelot is concerned, so there it needs to. Just ensure that the DSA tag_8021q CPU port is a member of all reserved VLANs when it is created, and is removed when it is deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:25 +08:00
return ocelot_port_mdb_del(ocelot, port, mdb, bridge_dev);
}
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
static void felix_bridge_stp_state_set(struct dsa_switch *ds, int port,
u8 state)
{
struct ocelot *ocelot = ds->priv;
return ocelot_bridge_stp_state_set(ocelot, port, state);
}
static int felix_pre_bridge_flags(struct dsa_switch *ds, int port,
struct switchdev_brport_flags val,
struct netlink_ext_ack *extack)
{
struct ocelot *ocelot = ds->priv;
return ocelot_port_pre_bridge_flags(ocelot, port, val);
}
static int felix_bridge_flags(struct dsa_switch *ds, int port,
struct switchdev_brport_flags val,
struct netlink_ext_ack *extack)
{
struct ocelot *ocelot = ds->priv;
if (port == ocelot->npi)
port = ocelot->num_phys_ports;
ocelot_port_bridge_flags(ocelot, port, val);
return 0;
}
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
static int felix_bridge_join(struct dsa_switch *ds, int port,
struct dsa_bridge bridge, bool *tx_fwd_offload,
struct netlink_ext_ack *extack)
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
{
struct ocelot *ocelot = ds->priv;
net: mscc: ocelot: enforce FDB isolation when VLAN-unaware Currently ocelot uses a pvid of 0 for standalone ports and ports under a VLAN-unaware bridge, and the pvid of the bridge for ports under a VLAN-aware bridge. Standalone ports do not perform learning, but packets received on them are still subject to FDB lookups. So if the MAC DA that a standalone port receives has been also learned on a VLAN-unaware bridge port, ocelot will attempt to forward to that port, even though it can't, so it will drop packets. So there is a desire to avoid that, and isolate the FDBs of different bridges from one another, and from standalone ports. The ocelot switch library has two distinct entry points: the felix DSA driver and the ocelot switchdev driver. We need to code up a minimal bridge_num allocation in the ocelot switchdev driver too, this is copied from DSA with the exception that ocelot does not care about DSA trees, cross-chip bridging etc. So it only looks at its own ports that are already in the same bridge. The ocelot switchdev driver uses the bridge_num it has allocated itself, while the felix driver uses the bridge_num allocated by DSA. They are both stored inside ocelot_port->bridge_num by the common function ocelot_port_bridge_join() which receives the bridge_num passed by value. Once we have a bridge_num, we can only use it to enforce isolation between VLAN-unaware bridges. As far as I can see, ocelot does not have anything like a FID that further makes VLAN 100 from a port be different to VLAN 100 from another port with regard to FDB lookup. So we simply deny multiple VLAN-aware bridges. For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we allocate a VLAN for each bridge_num. This will be used as the pvid of each port that is under that VLAN-unaware bridge, for as long as that bridge is VLAN-unaware. VID 0 remains only for standalone ports. It is okay if all standalone ports use the same VID 0, since they perform no address learning, the FDB will contain no entry in VLAN 0, so the packets will always be flooded to the only possible destination, the CPU port. The CPU port module doesn't need to be member of the VLANs to receive packets, but if we use the DSA tag_8021q protocol, those packets are part of the data plane as far as ocelot is concerned, so there it needs to. Just ensure that the DSA tag_8021q CPU port is a member of all reserved VLANs when it is created, and is removed when it is deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:25 +08:00
return ocelot_port_bridge_join(ocelot, port, bridge.dev, bridge.num,
extack);
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
}
static void felix_bridge_leave(struct dsa_switch *ds, int port,
net: dsa: keep the bridge_dev and bridge_num as part of the same structure The main desire behind this is to provide coherent bridge information to the fast path without locking. For example, right now we set dp->bridge_dev and dp->bridge_num from separate code paths, it is theoretically possible for a packet transmission to read these two port properties consecutively and find a bridge number which does not correspond with the bridge device. Another desire is to start passing more complex bridge information to dsa_switch_ops functions. For example, with FDB isolation, it is expected that drivers will need to be passed the bridge which requested an FDB/MDB entry to be offloaded, and along with that bridge_dev, the associated bridge_num should be passed too, in case the driver might want to implement an isolation scheme based on that number. We already pass the {bridge_dev, bridge_num} pair to the TX forwarding offload switch API, however we'd like to remove that and squash it into the basic bridge join/leave API. So that means we need to pass this pair to the bridge join/leave API. During dsa_port_bridge_leave, first we unset dp->bridge_dev, then we call the driver's .port_bridge_leave with what used to be our dp->bridge_dev, but provided as an argument. When bridge_dev and bridge_num get folded into a single structure, we need to preserve this behavior in dsa_port_bridge_leave: we need a copy of what used to be in dp->bridge. Switch drivers check bridge membership by comparing dp->bridge_dev with the provided bridge_dev, but now, if we provide the struct dsa_bridge as a pointer, they cannot keep comparing dp->bridge to the provided pointer, since this only points to an on-stack copy. To make this obvious and prevent driver writers from forgetting and doing stupid things, in this new API, the struct dsa_bridge is provided as a full structure (not very large, contains an int and a pointer) instead of a pointer. An explicit comparison function needs to be used to determine bridge membership: dsa_port_offloads_bridge(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 00:57:56 +08:00
struct dsa_bridge bridge)
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
{
struct ocelot *ocelot = ds->priv;
net: dsa: keep the bridge_dev and bridge_num as part of the same structure The main desire behind this is to provide coherent bridge information to the fast path without locking. For example, right now we set dp->bridge_dev and dp->bridge_num from separate code paths, it is theoretically possible for a packet transmission to read these two port properties consecutively and find a bridge number which does not correspond with the bridge device. Another desire is to start passing more complex bridge information to dsa_switch_ops functions. For example, with FDB isolation, it is expected that drivers will need to be passed the bridge which requested an FDB/MDB entry to be offloaded, and along with that bridge_dev, the associated bridge_num should be passed too, in case the driver might want to implement an isolation scheme based on that number. We already pass the {bridge_dev, bridge_num} pair to the TX forwarding offload switch API, however we'd like to remove that and squash it into the basic bridge join/leave API. So that means we need to pass this pair to the bridge join/leave API. During dsa_port_bridge_leave, first we unset dp->bridge_dev, then we call the driver's .port_bridge_leave with what used to be our dp->bridge_dev, but provided as an argument. When bridge_dev and bridge_num get folded into a single structure, we need to preserve this behavior in dsa_port_bridge_leave: we need a copy of what used to be in dp->bridge. Switch drivers check bridge membership by comparing dp->bridge_dev with the provided bridge_dev, but now, if we provide the struct dsa_bridge as a pointer, they cannot keep comparing dp->bridge to the provided pointer, since this only points to an on-stack copy. To make this obvious and prevent driver writers from forgetting and doing stupid things, in this new API, the struct dsa_bridge is provided as a full structure (not very large, contains an int and a pointer) instead of a pointer. An explicit comparison function needs to be used to determine bridge membership: dsa_port_offloads_bridge(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 00:57:56 +08:00
ocelot_port_bridge_leave(ocelot, port, bridge.dev);
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
}
static int felix_lag_join(struct dsa_switch *ds, int port,
net: dsa: create a dsa_lag structure The main purpose of this change is to create a data structure for a LAG as seen by DSA. This is similar to what we have for bridging - we pass a copy of this structure by value to ->port_lag_join and ->port_lag_leave. For now we keep the lag_dev, id and a reference count in it. Future patches will add a list of FDB entries for the LAG (these also need to be refcounted to work properly). The LAG structure is created using dsa_port_lag_create() and destroyed using dsa_port_lag_destroy(), just like we have for bridging. Because now, the dsa_lag itself is refcounted, we can simplify dsa_lag_map() and dsa_lag_unmap(). These functions need to keep a LAG in the dst->lags array only as long as at least one port uses it. The refcounting logic inside those functions can be removed now - they are called only when we should perform the operation. dsa_lag_dev() is renamed to dsa_lag_by_id() and now returns the dsa_lag structure instead of the lag_dev net_device. dsa_lag_foreach_port() now takes the dsa_lag structure as argument. dst->lags holds an array of dsa_lag structures. dsa_lag_map() now also saves the dsa_lag->id value, so that linear walking of dst->lags in drivers using dsa_lag_id() is no longer necessary. They can just look at lag.id. dsa_port_lag_id_get() is a helper, similar to dsa_port_bridge_num_get(), which can be used by drivers to get the LAG ID assigned by DSA to a given port. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-23 22:00:49 +08:00
struct dsa_lag lag,
struct netdev_lag_upper_info *info)
{
struct ocelot *ocelot = ds->priv;
net: dsa: create a dsa_lag structure The main purpose of this change is to create a data structure for a LAG as seen by DSA. This is similar to what we have for bridging - we pass a copy of this structure by value to ->port_lag_join and ->port_lag_leave. For now we keep the lag_dev, id and a reference count in it. Future patches will add a list of FDB entries for the LAG (these also need to be refcounted to work properly). The LAG structure is created using dsa_port_lag_create() and destroyed using dsa_port_lag_destroy(), just like we have for bridging. Because now, the dsa_lag itself is refcounted, we can simplify dsa_lag_map() and dsa_lag_unmap(). These functions need to keep a LAG in the dst->lags array only as long as at least one port uses it. The refcounting logic inside those functions can be removed now - they are called only when we should perform the operation. dsa_lag_dev() is renamed to dsa_lag_by_id() and now returns the dsa_lag structure instead of the lag_dev net_device. dsa_lag_foreach_port() now takes the dsa_lag structure as argument. dst->lags holds an array of dsa_lag structures. dsa_lag_map() now also saves the dsa_lag->id value, so that linear walking of dst->lags in drivers using dsa_lag_id() is no longer necessary. They can just look at lag.id. dsa_port_lag_id_get() is a helper, similar to dsa_port_bridge_num_get(), which can be used by drivers to get the LAG ID assigned by DSA to a given port. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-23 22:00:49 +08:00
return ocelot_port_lag_join(ocelot, port, lag.dev, info);
}
static int felix_lag_leave(struct dsa_switch *ds, int port,
net: dsa: create a dsa_lag structure The main purpose of this change is to create a data structure for a LAG as seen by DSA. This is similar to what we have for bridging - we pass a copy of this structure by value to ->port_lag_join and ->port_lag_leave. For now we keep the lag_dev, id and a reference count in it. Future patches will add a list of FDB entries for the LAG (these also need to be refcounted to work properly). The LAG structure is created using dsa_port_lag_create() and destroyed using dsa_port_lag_destroy(), just like we have for bridging. Because now, the dsa_lag itself is refcounted, we can simplify dsa_lag_map() and dsa_lag_unmap(). These functions need to keep a LAG in the dst->lags array only as long as at least one port uses it. The refcounting logic inside those functions can be removed now - they are called only when we should perform the operation. dsa_lag_dev() is renamed to dsa_lag_by_id() and now returns the dsa_lag structure instead of the lag_dev net_device. dsa_lag_foreach_port() now takes the dsa_lag structure as argument. dst->lags holds an array of dsa_lag structures. dsa_lag_map() now also saves the dsa_lag->id value, so that linear walking of dst->lags in drivers using dsa_lag_id() is no longer necessary. They can just look at lag.id. dsa_port_lag_id_get() is a helper, similar to dsa_port_bridge_num_get(), which can be used by drivers to get the LAG ID assigned by DSA to a given port. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-23 22:00:49 +08:00
struct dsa_lag lag)
{
struct ocelot *ocelot = ds->priv;
net: dsa: create a dsa_lag structure The main purpose of this change is to create a data structure for a LAG as seen by DSA. This is similar to what we have for bridging - we pass a copy of this structure by value to ->port_lag_join and ->port_lag_leave. For now we keep the lag_dev, id and a reference count in it. Future patches will add a list of FDB entries for the LAG (these also need to be refcounted to work properly). The LAG structure is created using dsa_port_lag_create() and destroyed using dsa_port_lag_destroy(), just like we have for bridging. Because now, the dsa_lag itself is refcounted, we can simplify dsa_lag_map() and dsa_lag_unmap(). These functions need to keep a LAG in the dst->lags array only as long as at least one port uses it. The refcounting logic inside those functions can be removed now - they are called only when we should perform the operation. dsa_lag_dev() is renamed to dsa_lag_by_id() and now returns the dsa_lag structure instead of the lag_dev net_device. dsa_lag_foreach_port() now takes the dsa_lag structure as argument. dst->lags holds an array of dsa_lag structures. dsa_lag_map() now also saves the dsa_lag->id value, so that linear walking of dst->lags in drivers using dsa_lag_id() is no longer necessary. They can just look at lag.id. dsa_port_lag_id_get() is a helper, similar to dsa_port_bridge_num_get(), which can be used by drivers to get the LAG ID assigned by DSA to a given port. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-23 22:00:49 +08:00
ocelot_port_lag_leave(ocelot, port, lag.dev);
return 0;
}
static int felix_lag_change(struct dsa_switch *ds, int port)
{
struct dsa_port *dp = dsa_to_port(ds, port);
struct ocelot *ocelot = ds->priv;
ocelot_port_lag_change(ocelot, port, dp->lag_tx_enabled);
return 0;
}
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
static int felix_vlan_prepare(struct dsa_switch *ds, int port,
const struct switchdev_obj_port_vlan *vlan,
struct netlink_ext_ack *extack)
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
{
struct ocelot *ocelot = ds->priv;
net: switchdev: remove vid_begin -> vid_end range from VLAN objects The call path of a switchdev VLAN addition to the bridge looks something like this today: nbp_vlan_init | __br_vlan_set_default_pvid | | | | | br_afspec | | | | | | | v | | | br_process_vlan_info | | | | | | | v | | | br_vlan_info | | | / \ / | | / \ / | | / \ / | | / \ / v v v v v nbp_vlan_add br_vlan_add ------+ | ^ ^ | | | / | | | | / / / | \ br_vlan_get_master/ / v \ ^ / / br_vlan_add_existing \ | / / | \ | / / / \ | / / / \ | / / / \ | / / / v | | v / __vlan_add / / | / / | / v | / __vlan_vid_add | / \ | / v v v br_switchdev_port_vlan_add The ranges UAPI was introduced to the bridge in commit bdced7ef7838 ("bridge: support for multiple vlans and vlan ranges in setlink and dellink requests") (Jan 10 2015). But the VLAN ranges (parsed in br_afspec) have always been passed one by one, through struct bridge_vlan_info tmp_vinfo, to br_vlan_info. So the range never went too far in depth. Then Scott Feldman introduced the switchdev_port_bridge_setlink function in commit 47f8328bb1a4 ("switchdev: add new switchdev bridge setlink"). That marked the introduction of the SWITCHDEV_OBJ_PORT_VLAN, which made full use of the range. But switchdev_port_bridge_setlink was called like this: br_setlink -> br_afspec -> switchdev_port_bridge_setlink Basically, the switchdev and the bridge code were not tightly integrated. Then commit 41c498b9359e ("bridge: restore br_setlink back to original") came, and switchdev drivers were required to implement .ndo_bridge_setlink = switchdev_port_bridge_setlink for a while. In the meantime, commits such as 0944d6b5a2fa ("bridge: try switchdev op first in __vlan_vid_add/del") finally made switchdev penetrate the br_vlan_info() barrier and start to develop the call path we have today. But remember, br_vlan_info() still receives VLANs one by one. Then Arkadi Sharshevsky refactored the switchdev API in 2017 in commit 29ab586c3d83 ("net: switchdev: Remove bridge bypass support from switchdev") so that drivers would not implement .ndo_bridge_setlink any longer. The switchdev_port_bridge_setlink also got deleted. This refactoring removed the parallel bridge_setlink implementation from switchdev, and left the only switchdev VLAN objects to be the ones offloaded from __vlan_vid_add (basically RX filtering) and __vlan_add (the latter coming from commit 9c86ce2c1ae3 ("net: bridge: Notify about bridge VLANs")). That is to say, today the switchdev VLAN object ranges are not used in the kernel. Refactoring the above call path is a bit complicated, when the bridge VLAN call path is already a bit complicated. Let's go off and finish the job of commit 29ab586c3d83 by deleting the bogus iteration through the VLAN ranges from the drivers. Some aspects of this feature never made too much sense in the first place. For example, what is a range of VLANs all having the BRIDGE_VLAN_INFO_PVID flag supposed to mean, when a port can obviously have a single pvid? This particular configuration _is_ denied as of commit 6623c60dc28e ("bridge: vlan: enforce no pvid flag in vlan ranges"), but from an API perspective, the driver still has to play pretend, and only offload the vlan->vid_end as pvid. And the addition of a switchdev VLAN object can modify the flags of another, completely unrelated, switchdev VLAN object! (a VLAN that is PVID will invalidate the PVID flag from whatever other VLAN had previously been offloaded with switchdev and had that flag. Yet switchdev never notifies about that change, drivers are supposed to guess). Nonetheless, having a VLAN range in the API makes error handling look scarier than it really is - unwinding on errors and all of that. When in reality, no one really calls this API with more than one VLAN. It is all unnecessary complexity. And despite appearing pretentious (two-phase transactional model and all), the switchdev API is really sloppy because the VLAN addition and removal operations are not paired with one another (you can add a VLAN 100 times and delete it just once). The bridge notifies through switchdev of a VLAN addition not only when the flags of an existing VLAN change, but also when nothing changes. There are switchdev drivers out there who don't like adding a VLAN that has already been added, and those checks don't really belong at driver level. But the fact that the API contains ranges is yet another factor that prevents this from being addressed in the future. Of the existing switchdev pieces of hardware, it appears that only Mellanox Spectrum supports offloading more than one VLAN at a time, through mlxsw_sp_port_vlan_set. I have kept that code internal to the driver, because there is some more bookkeeping that makes use of it, but I deleted it from the switchdev API. But since the switchdev support for ranges has already been de facto deleted by a Mellanox employee and nobody noticed for 4 years, I'm going to assume it's not a biggie. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> # switchdev and mlxsw Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-09 08:01:46 +08:00
u16 flags = vlan->flags;
/* Ocelot switches copy frames as-is to the CPU, so the flags:
* egress-untagged or not, pvid or not, make no difference. This
* behavior is already better than what DSA just tries to approximate
* when it installs the VLAN with the same flags on the CPU port.
* Just accept any configuration, and don't let ocelot deny installing
* multiple native VLANs on the NPI port, because the switch doesn't
* look at the port tag settings towards the NPI interface anyway.
*/
if (port == ocelot->npi)
return 0;
net: switchdev: remove vid_begin -> vid_end range from VLAN objects The call path of a switchdev VLAN addition to the bridge looks something like this today: nbp_vlan_init | __br_vlan_set_default_pvid | | | | | br_afspec | | | | | | | v | | | br_process_vlan_info | | | | | | | v | | | br_vlan_info | | | / \ / | | / \ / | | / \ / | | / \ / v v v v v nbp_vlan_add br_vlan_add ------+ | ^ ^ | | | / | | | | / / / | \ br_vlan_get_master/ / v \ ^ / / br_vlan_add_existing \ | / / | \ | / / / \ | / / / \ | / / / \ | / / / v | | v / __vlan_add / / | / / | / v | / __vlan_vid_add | / \ | / v v v br_switchdev_port_vlan_add The ranges UAPI was introduced to the bridge in commit bdced7ef7838 ("bridge: support for multiple vlans and vlan ranges in setlink and dellink requests") (Jan 10 2015). But the VLAN ranges (parsed in br_afspec) have always been passed one by one, through struct bridge_vlan_info tmp_vinfo, to br_vlan_info. So the range never went too far in depth. Then Scott Feldman introduced the switchdev_port_bridge_setlink function in commit 47f8328bb1a4 ("switchdev: add new switchdev bridge setlink"). That marked the introduction of the SWITCHDEV_OBJ_PORT_VLAN, which made full use of the range. But switchdev_port_bridge_setlink was called like this: br_setlink -> br_afspec -> switchdev_port_bridge_setlink Basically, the switchdev and the bridge code were not tightly integrated. Then commit 41c498b9359e ("bridge: restore br_setlink back to original") came, and switchdev drivers were required to implement .ndo_bridge_setlink = switchdev_port_bridge_setlink for a while. In the meantime, commits such as 0944d6b5a2fa ("bridge: try switchdev op first in __vlan_vid_add/del") finally made switchdev penetrate the br_vlan_info() barrier and start to develop the call path we have today. But remember, br_vlan_info() still receives VLANs one by one. Then Arkadi Sharshevsky refactored the switchdev API in 2017 in commit 29ab586c3d83 ("net: switchdev: Remove bridge bypass support from switchdev") so that drivers would not implement .ndo_bridge_setlink any longer. The switchdev_port_bridge_setlink also got deleted. This refactoring removed the parallel bridge_setlink implementation from switchdev, and left the only switchdev VLAN objects to be the ones offloaded from __vlan_vid_add (basically RX filtering) and __vlan_add (the latter coming from commit 9c86ce2c1ae3 ("net: bridge: Notify about bridge VLANs")). That is to say, today the switchdev VLAN object ranges are not used in the kernel. Refactoring the above call path is a bit complicated, when the bridge VLAN call path is already a bit complicated. Let's go off and finish the job of commit 29ab586c3d83 by deleting the bogus iteration through the VLAN ranges from the drivers. Some aspects of this feature never made too much sense in the first place. For example, what is a range of VLANs all having the BRIDGE_VLAN_INFO_PVID flag supposed to mean, when a port can obviously have a single pvid? This particular configuration _is_ denied as of commit 6623c60dc28e ("bridge: vlan: enforce no pvid flag in vlan ranges"), but from an API perspective, the driver still has to play pretend, and only offload the vlan->vid_end as pvid. And the addition of a switchdev VLAN object can modify the flags of another, completely unrelated, switchdev VLAN object! (a VLAN that is PVID will invalidate the PVID flag from whatever other VLAN had previously been offloaded with switchdev and had that flag. Yet switchdev never notifies about that change, drivers are supposed to guess). Nonetheless, having a VLAN range in the API makes error handling look scarier than it really is - unwinding on errors and all of that. When in reality, no one really calls this API with more than one VLAN. It is all unnecessary complexity. And despite appearing pretentious (two-phase transactional model and all), the switchdev API is really sloppy because the VLAN addition and removal operations are not paired with one another (you can add a VLAN 100 times and delete it just once). The bridge notifies through switchdev of a VLAN addition not only when the flags of an existing VLAN change, but also when nothing changes. There are switchdev drivers out there who don't like adding a VLAN that has already been added, and those checks don't really belong at driver level. But the fact that the API contains ranges is yet another factor that prevents this from being addressed in the future. Of the existing switchdev pieces of hardware, it appears that only Mellanox Spectrum supports offloading more than one VLAN at a time, through mlxsw_sp_port_vlan_set. I have kept that code internal to the driver, because there is some more bookkeeping that makes use of it, but I deleted it from the switchdev API. But since the switchdev support for ranges has already been de facto deleted by a Mellanox employee and nobody noticed for 4 years, I'm going to assume it's not a biggie. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> # switchdev and mlxsw Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-09 08:01:46 +08:00
return ocelot_vlan_prepare(ocelot, port, vlan->vid,
flags & BRIDGE_VLAN_INFO_PVID,
flags & BRIDGE_VLAN_INFO_UNTAGGED,
extack);
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
}
static int felix_vlan_filtering(struct dsa_switch *ds, int port, bool enabled,
struct netlink_ext_ack *extack)
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
{
struct ocelot *ocelot = ds->priv;
return ocelot_port_vlan_filtering(ocelot, port, enabled, extack);
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
}
net: dsa: remove the transactional logic from VLAN objects It should be the driver's business to logically separate its VLAN offloading into a preparation and a commit phase, and some drivers don't need / can't do this. So remove the transactional shim from DSA and let drivers propagate errors directly from the .port_vlan_add callback. It would appear that the code has worse error handling now than it had before. DSA is the only in-kernel user of switchdev that offloads one switchdev object to more than one port: for every VLAN object offloaded to a user port, that VLAN is also offloaded to the CPU port. So the "prepare for user port -> check for errors -> prepare for CPU port -> check for errors -> commit for user port -> commit for CPU port" sequence appears to make more sense than the one we are using now: "offload to user port -> check for errors -> offload to CPU port -> check for errors", but it is really a compromise. In the new way, we can catch errors from the commit phase that we previously had to ignore. But we have our hands tied and cannot do any rollback now: if we add a VLAN on the CPU port and it fails, we can't do the rollback by simply deleting it from the user port, because the switchdev API is not so nice with us: it could have simply been there already, even with the same flags. So we don't even attempt to rollback anything on addition error, just leave whatever VLANs managed to get offloaded right where they are. This should not be a problem at all in practice. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-09 08:01:53 +08:00
static int felix_vlan_add(struct dsa_switch *ds, int port,
const struct switchdev_obj_port_vlan *vlan,
struct netlink_ext_ack *extack)
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
{
struct ocelot *ocelot = ds->priv;
u16 flags = vlan->flags;
net: dsa: remove the transactional logic from VLAN objects It should be the driver's business to logically separate its VLAN offloading into a preparation and a commit phase, and some drivers don't need / can't do this. So remove the transactional shim from DSA and let drivers propagate errors directly from the .port_vlan_add callback. It would appear that the code has worse error handling now than it had before. DSA is the only in-kernel user of switchdev that offloads one switchdev object to more than one port: for every VLAN object offloaded to a user port, that VLAN is also offloaded to the CPU port. So the "prepare for user port -> check for errors -> prepare for CPU port -> check for errors -> commit for user port -> commit for CPU port" sequence appears to make more sense than the one we are using now: "offload to user port -> check for errors -> offload to CPU port -> check for errors", but it is really a compromise. In the new way, we can catch errors from the commit phase that we previously had to ignore. But we have our hands tied and cannot do any rollback now: if we add a VLAN on the CPU port and it fails, we can't do the rollback by simply deleting it from the user port, because the switchdev API is not so nice with us: it could have simply been there already, even with the same flags. So we don't even attempt to rollback anything on addition error, just leave whatever VLANs managed to get offloaded right where they are. This should not be a problem at all in practice. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-09 08:01:53 +08:00
int err;
err = felix_vlan_prepare(ds, port, vlan, extack);
net: dsa: remove the transactional logic from VLAN objects It should be the driver's business to logically separate its VLAN offloading into a preparation and a commit phase, and some drivers don't need / can't do this. So remove the transactional shim from DSA and let drivers propagate errors directly from the .port_vlan_add callback. It would appear that the code has worse error handling now than it had before. DSA is the only in-kernel user of switchdev that offloads one switchdev object to more than one port: for every VLAN object offloaded to a user port, that VLAN is also offloaded to the CPU port. So the "prepare for user port -> check for errors -> prepare for CPU port -> check for errors -> commit for user port -> commit for CPU port" sequence appears to make more sense than the one we are using now: "offload to user port -> check for errors -> offload to CPU port -> check for errors", but it is really a compromise. In the new way, we can catch errors from the commit phase that we previously had to ignore. But we have our hands tied and cannot do any rollback now: if we add a VLAN on the CPU port and it fails, we can't do the rollback by simply deleting it from the user port, because the switchdev API is not so nice with us: it could have simply been there already, even with the same flags. So we don't even attempt to rollback anything on addition error, just leave whatever VLANs managed to get offloaded right where they are. This should not be a problem at all in practice. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-09 08:01:53 +08:00
if (err)
return err;
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
net: dsa: remove the transactional logic from VLAN objects It should be the driver's business to logically separate its VLAN offloading into a preparation and a commit phase, and some drivers don't need / can't do this. So remove the transactional shim from DSA and let drivers propagate errors directly from the .port_vlan_add callback. It would appear that the code has worse error handling now than it had before. DSA is the only in-kernel user of switchdev that offloads one switchdev object to more than one port: for every VLAN object offloaded to a user port, that VLAN is also offloaded to the CPU port. So the "prepare for user port -> check for errors -> prepare for CPU port -> check for errors -> commit for user port -> commit for CPU port" sequence appears to make more sense than the one we are using now: "offload to user port -> check for errors -> offload to CPU port -> check for errors", but it is really a compromise. In the new way, we can catch errors from the commit phase that we previously had to ignore. But we have our hands tied and cannot do any rollback now: if we add a VLAN on the CPU port and it fails, we can't do the rollback by simply deleting it from the user port, because the switchdev API is not so nice with us: it could have simply been there already, even with the same flags. So we don't even attempt to rollback anything on addition error, just leave whatever VLANs managed to get offloaded right where they are. This should not be a problem at all in practice. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-09 08:01:53 +08:00
return ocelot_vlan_add(ocelot, port, vlan->vid,
flags & BRIDGE_VLAN_INFO_PVID,
flags & BRIDGE_VLAN_INFO_UNTAGGED);
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
}
static int felix_vlan_del(struct dsa_switch *ds, int port,
const struct switchdev_obj_port_vlan *vlan)
{
struct ocelot *ocelot = ds->priv;
net: switchdev: remove vid_begin -> vid_end range from VLAN objects The call path of a switchdev VLAN addition to the bridge looks something like this today: nbp_vlan_init | __br_vlan_set_default_pvid | | | | | br_afspec | | | | | | | v | | | br_process_vlan_info | | | | | | | v | | | br_vlan_info | | | / \ / | | / \ / | | / \ / | | / \ / v v v v v nbp_vlan_add br_vlan_add ------+ | ^ ^ | | | / | | | | / / / | \ br_vlan_get_master/ / v \ ^ / / br_vlan_add_existing \ | / / | \ | / / / \ | / / / \ | / / / \ | / / / v | | v / __vlan_add / / | / / | / v | / __vlan_vid_add | / \ | / v v v br_switchdev_port_vlan_add The ranges UAPI was introduced to the bridge in commit bdced7ef7838 ("bridge: support for multiple vlans and vlan ranges in setlink and dellink requests") (Jan 10 2015). But the VLAN ranges (parsed in br_afspec) have always been passed one by one, through struct bridge_vlan_info tmp_vinfo, to br_vlan_info. So the range never went too far in depth. Then Scott Feldman introduced the switchdev_port_bridge_setlink function in commit 47f8328bb1a4 ("switchdev: add new switchdev bridge setlink"). That marked the introduction of the SWITCHDEV_OBJ_PORT_VLAN, which made full use of the range. But switchdev_port_bridge_setlink was called like this: br_setlink -> br_afspec -> switchdev_port_bridge_setlink Basically, the switchdev and the bridge code were not tightly integrated. Then commit 41c498b9359e ("bridge: restore br_setlink back to original") came, and switchdev drivers were required to implement .ndo_bridge_setlink = switchdev_port_bridge_setlink for a while. In the meantime, commits such as 0944d6b5a2fa ("bridge: try switchdev op first in __vlan_vid_add/del") finally made switchdev penetrate the br_vlan_info() barrier and start to develop the call path we have today. But remember, br_vlan_info() still receives VLANs one by one. Then Arkadi Sharshevsky refactored the switchdev API in 2017 in commit 29ab586c3d83 ("net: switchdev: Remove bridge bypass support from switchdev") so that drivers would not implement .ndo_bridge_setlink any longer. The switchdev_port_bridge_setlink also got deleted. This refactoring removed the parallel bridge_setlink implementation from switchdev, and left the only switchdev VLAN objects to be the ones offloaded from __vlan_vid_add (basically RX filtering) and __vlan_add (the latter coming from commit 9c86ce2c1ae3 ("net: bridge: Notify about bridge VLANs")). That is to say, today the switchdev VLAN object ranges are not used in the kernel. Refactoring the above call path is a bit complicated, when the bridge VLAN call path is already a bit complicated. Let's go off and finish the job of commit 29ab586c3d83 by deleting the bogus iteration through the VLAN ranges from the drivers. Some aspects of this feature never made too much sense in the first place. For example, what is a range of VLANs all having the BRIDGE_VLAN_INFO_PVID flag supposed to mean, when a port can obviously have a single pvid? This particular configuration _is_ denied as of commit 6623c60dc28e ("bridge: vlan: enforce no pvid flag in vlan ranges"), but from an API perspective, the driver still has to play pretend, and only offload the vlan->vid_end as pvid. And the addition of a switchdev VLAN object can modify the flags of another, completely unrelated, switchdev VLAN object! (a VLAN that is PVID will invalidate the PVID flag from whatever other VLAN had previously been offloaded with switchdev and had that flag. Yet switchdev never notifies about that change, drivers are supposed to guess). Nonetheless, having a VLAN range in the API makes error handling look scarier than it really is - unwinding on errors and all of that. When in reality, no one really calls this API with more than one VLAN. It is all unnecessary complexity. And despite appearing pretentious (two-phase transactional model and all), the switchdev API is really sloppy because the VLAN addition and removal operations are not paired with one another (you can add a VLAN 100 times and delete it just once). The bridge notifies through switchdev of a VLAN addition not only when the flags of an existing VLAN change, but also when nothing changes. There are switchdev drivers out there who don't like adding a VLAN that has already been added, and those checks don't really belong at driver level. But the fact that the API contains ranges is yet another factor that prevents this from being addressed in the future. Of the existing switchdev pieces of hardware, it appears that only Mellanox Spectrum supports offloading more than one VLAN at a time, through mlxsw_sp_port_vlan_set. I have kept that code internal to the driver, because there is some more bookkeeping that makes use of it, but I deleted it from the switchdev API. But since the switchdev support for ranges has already been de facto deleted by a Mellanox employee and nobody noticed for 4 years, I'm going to assume it's not a biggie. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> # switchdev and mlxsw Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-09 08:01:46 +08:00
return ocelot_vlan_del(ocelot, port, vlan->vid);
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
}
static void felix_phylink_get_caps(struct dsa_switch *ds, int port,
struct phylink_config *config)
{
struct ocelot *ocelot = ds->priv;
/* This driver does not make use of the speed, duplex, pause or the
* advertisement in its mac_config, so it is safe to mark this driver
* as non-legacy.
*/
config->legacy_pre_march2020 = false;
__set_bit(ocelot->ports[port]->phy_mode,
config->supported_interfaces);
}
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
static void felix_phylink_validate(struct dsa_switch *ds, int port,
unsigned long *supported,
struct phylink_link_state *state)
{
struct ocelot *ocelot = ds->priv;
struct felix *felix = ocelot_to_felix(ocelot);
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
if (felix->info->phylink_validate)
felix->info->phylink_validate(ocelot, port, supported, state);
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
}
static struct phylink_pcs *felix_phylink_mac_select_pcs(struct dsa_switch *ds,
int port,
phy_interface_t iface)
{
struct ocelot *ocelot = ds->priv;
struct felix *felix = ocelot_to_felix(ocelot);
struct phylink_pcs *pcs = NULL;
if (felix->pcs && felix->pcs[port])
pcs = felix->pcs[port];
return pcs;
}
static void felix_phylink_mac_link_down(struct dsa_switch *ds, int port,
unsigned int link_an_mode,
phy_interface_t interface)
{
struct ocelot *ocelot = ds->priv;
net: dsa: felix: implement port flushing on .phylink_mac_link_down There are several issues which may be seen when the link goes down while forwarding traffic, all of which can be attributed to the fact that the port flushing procedure from the reference manual was not closely followed. With flow control enabled on both the ingress port and the egress port, it may happen when a link goes down that Ethernet packets are in flight. In flow control mode, frames are held back and not dropped. When there is enough traffic in flight (example: iperf3 TCP), then the ingress port might enter congestion and never exit that state. This is a problem, because it is the egress port's link that went down, and that has caused the inability of the ingress port to send packets to any other port. This is solved by flushing the egress port's queues when it goes down. There is also a problem when performing stream splitting for IEEE 802.1CB traffic (not yet upstream, but a sort of multicast, basically). There, if one port from the destination ports mask goes down, splitting the stream towards the other destinations will no longer be performed. This can be traced down to this line: ocelot_port_writel(ocelot_port, 0, DEV_MAC_ENA_CFG); which should have been instead, as per the reference manual: ocelot_port_rmwl(ocelot_port, 0, DEV_MAC_ENA_CFG_RX_ENA, DEV_MAC_ENA_CFG); Basically only DEV_MAC_ENA_CFG_RX_ENA should be disabled, but not DEV_MAC_ENA_CFG_TX_ENA - I don't have further insight into why that is the case, but apparently multicasting to several ports will cause issues if at least one of them doesn't have DEV_MAC_ENA_CFG_TX_ENA set. I am not sure what the state of the Ocelot VSC7514 driver is, but probably not as bad as Felix/Seville, since VSC7514 uses phylib and has the following in ocelot_adjust_link: if (!phydev->link) return; therefore the port is not really put down when the link is lost, unlike the DSA drivers which use .phylink_mac_link_down for that. Nonetheless, I put ocelot_port_flush() in the common ocelot.c because it needs to access some registers from drivers/net/ethernet/mscc/ocelot_rew.h which are not exported in include/soc/mscc/ and a bugfix patch should probably not move headers around. Fixes: bdeced75b13f ("net: dsa: felix: Add PCS operations for PHYLINK") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09 01:36:27 +08:00
net: mscc: ocelot: convert to phylink The felix DSA driver, which is a wrapper over the same hardware class as ocelot, is integrated with phylink, but ocelot is using the plain PHY library. It makes sense to bring together the two implementations, which is what this patch achieves. This is a large patch and hard to break up, but it does the following: The existing ocelot_adjust_link writes some registers, and felix_phylink_mac_link_up writes some registers, some of them are common, but both functions write to some registers to which the other doesn't. The main reasons for this are: - Felix switches so far have used an NXP PCS so they had no need to write the PCS1G registers that ocelot_adjust_link writes - Felix switches have the MAC fixed at 1G, so some of the MAC speed changes actually break the link and must be avoided. The naming conventions for the functions introduced in this patch are: - vsc7514_phylink_{mac_config,validate} are specific to the Ocelot instantiations and placed in ocelot_net.c which is built only for the ocelot switchdev driver. - ocelot_phylink_mac_link_{up,down} are shared between the ocelot switchdev driver and the felix DSA driver (they are put in the common lib). One by one, the registers written by ocelot_adjust_link are: DEV_MAC_MODE_CFG - felix_phylink_mac_link_up had no need to write this register since its out-of-reset value was fine and did not need changing. The write is moved to the common ocelot_phylink_mac_link_up and on felix it is guarded by a quirk bit that makes the written value identical with the out-of-reset one DEV_PORT_MISC - runtime invariant, was moved to vsc7514_phylink_mac_config PCS1G_MODE_CFG - same as above PCS1G_SD_CFG - same as above PCS1G_CFG - same as above PCS1G_ANEG_CFG - same as above PCS1G_LB_CFG - same as above DEV_MAC_ENA_CFG - both ocelot_adjust_link and ocelot_port_disable touched this. felix_phylink_mac_link_{up,down} also do. We go with what felix does and put it in ocelot_phylink_mac_link_up. DEV_CLOCK_CFG - ocelot_adjust_link and felix_phylink_mac_link_up both write this, but to different values. Move to the common ocelot_phylink_mac_link_up and make sure via the quirk that the old values are preserved for both. ANA_PFC_PFC_CFG - ocelot_adjust_link wrote this, felix_phylink_mac_link_up did not. Runtime invariant, speed does not matter since PFC is disabled via the RX_PFC_ENA bits which are cleared. Move to vsc7514_phylink_mac_config. QSYS_SWITCH_PORT_MODE_PORT_ENA - both ocelot_adjust_link and felix_phylink_mac_link_{up,down} wrote this. Ocelot also wrote this register from ocelot_port_disable. Keep what felix did, move in ocelot_phylink_mac_link_{up,down} and delete ocelot_port_disable. ANA_POL_FLOWC - same as above SYS_MAC_FC_CFG - same as above, except slight behavior change. Whereas ocelot always enabled RX and TX flow control, felix listened to phylink (for the most part, at least - see the 2500base-X comment). The registers which only felix_phylink_mac_link_up wrote are: SYS_PAUSE_CFG_PAUSE_ENA - this is why I am not sure that flow control worked on ocelot. Not it should, since the code is shared with felix where it does. ANA_PORT_PORT_CFG - this is a Frame Analyzer block register, phylink should be the one touching them, deleted. Other changes: - The old phylib registration code was in mscc_ocelot_init_ports. It is hard to work with 2 levels of indentation already in, and with hard to follow teardown logic. The new phylink registration code was moved inside ocelot_probe_port(), right between alloc_etherdev() and register_netdev(). It could not be done before (=> outside of) ocelot_probe_port() because ocelot_probe_port() allocates the struct ocelot_port which we then use to assign ocelot_port->phy_mode to. It is more preferable to me to have all PHY handling logic inside the same function. - On the same topic: struct ocelot_port_private :: serdes is only used in ocelot_port_open to set the SERDES protocol to Ethernet. This is logically a runtime invariant and can be done just once, when the port registers with phylink. We therefore don't even need to keep the serdes reference inside struct ocelot_port_private, or to use the devm variant of of_phy_get(). - Phylink needs a valid phy-mode for phylink_create() to succeed, and the existing device tree bindings in arch/mips/boot/dts/mscc/ocelot_pcb120.dts don't define one for the internal PHY ports. So we patch PHY_INTERFACE_MODE_NA into PHY_INTERFACE_MODE_INTERNAL. - There was a strategically placed: switch (priv->phy_mode) { case PHY_INTERFACE_MODE_NA: continue; which made the code skip the serdes initialization for the internal PHY ports. Frankly that is not all that obvious, so now we explicitly initialize the serdes under an "if" condition and not rely on code jumps, so everything is clearer. - There was a write of OCELOT_SPEED_1000 to DEV_CLOCK_CFG for QSGMII ports. Since that is in fact the default value for the register field DEV_CLOCK_CFG_LINK_SPEED, I can only guess the intention was to clear the adjacent fields, MAC_TX_RST and MAC_RX_RST, aka take the port out of reset, which does match the comment. I don't even want to know why this code is placed there, but if there is indeed an issue that all ports that share a QSGMII lane must all be up, then this logic is already buggy, since mscc_ocelot_init_ports iterates using for_each_available_child_of_node, so nobody prevents the user from putting a 'status = "disabled";' for some QSGMII ports which would break the driver's assumption. In any case, in the eventuality that I'm right, we would have yet another issue if ocelot_phylink_mac_link_down would reset those ports and that would be forbidden, so since the ocelot_adjust_link logic did not do that (maybe for a reason), add another quirk to preserve the old logic. The ocelot driver teardown goes through all ports in one fell swoop. When initialization of one port fails, the ocelot->ports[port] pointer for that is reset to NULL, and teardown is done only for non-NULL ports, so there is no reason to do partial teardowns, let the central mscc_ocelot_release_ports() do its job. Tested bind, unbind, rebind, link up, link down, speed change on mock-up hardware (modified the driver to probe on Felix VSC9959). Also regression tested the felix DSA driver. Could not test the Ocelot specific bits (PCS1G, SERDES, device tree bindings). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-15 09:47:48 +08:00
ocelot_phylink_mac_link_down(ocelot, port, link_an_mode, interface,
FELIX_MAC_QUIRKS);
}
static void felix_phylink_mac_link_up(struct dsa_switch *ds, int port,
unsigned int link_an_mode,
phy_interface_t interface,
struct phy_device *phydev,
int speed, int duplex,
bool tx_pause, bool rx_pause)
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
{
struct ocelot *ocelot = ds->priv;
struct felix *felix = ocelot_to_felix(ocelot);
net: mscc: ocelot: convert to phylink The felix DSA driver, which is a wrapper over the same hardware class as ocelot, is integrated with phylink, but ocelot is using the plain PHY library. It makes sense to bring together the two implementations, which is what this patch achieves. This is a large patch and hard to break up, but it does the following: The existing ocelot_adjust_link writes some registers, and felix_phylink_mac_link_up writes some registers, some of them are common, but both functions write to some registers to which the other doesn't. The main reasons for this are: - Felix switches so far have used an NXP PCS so they had no need to write the PCS1G registers that ocelot_adjust_link writes - Felix switches have the MAC fixed at 1G, so some of the MAC speed changes actually break the link and must be avoided. The naming conventions for the functions introduced in this patch are: - vsc7514_phylink_{mac_config,validate} are specific to the Ocelot instantiations and placed in ocelot_net.c which is built only for the ocelot switchdev driver. - ocelot_phylink_mac_link_{up,down} are shared between the ocelot switchdev driver and the felix DSA driver (they are put in the common lib). One by one, the registers written by ocelot_adjust_link are: DEV_MAC_MODE_CFG - felix_phylink_mac_link_up had no need to write this register since its out-of-reset value was fine and did not need changing. The write is moved to the common ocelot_phylink_mac_link_up and on felix it is guarded by a quirk bit that makes the written value identical with the out-of-reset one DEV_PORT_MISC - runtime invariant, was moved to vsc7514_phylink_mac_config PCS1G_MODE_CFG - same as above PCS1G_SD_CFG - same as above PCS1G_CFG - same as above PCS1G_ANEG_CFG - same as above PCS1G_LB_CFG - same as above DEV_MAC_ENA_CFG - both ocelot_adjust_link and ocelot_port_disable touched this. felix_phylink_mac_link_{up,down} also do. We go with what felix does and put it in ocelot_phylink_mac_link_up. DEV_CLOCK_CFG - ocelot_adjust_link and felix_phylink_mac_link_up both write this, but to different values. Move to the common ocelot_phylink_mac_link_up and make sure via the quirk that the old values are preserved for both. ANA_PFC_PFC_CFG - ocelot_adjust_link wrote this, felix_phylink_mac_link_up did not. Runtime invariant, speed does not matter since PFC is disabled via the RX_PFC_ENA bits which are cleared. Move to vsc7514_phylink_mac_config. QSYS_SWITCH_PORT_MODE_PORT_ENA - both ocelot_adjust_link and felix_phylink_mac_link_{up,down} wrote this. Ocelot also wrote this register from ocelot_port_disable. Keep what felix did, move in ocelot_phylink_mac_link_{up,down} and delete ocelot_port_disable. ANA_POL_FLOWC - same as above SYS_MAC_FC_CFG - same as above, except slight behavior change. Whereas ocelot always enabled RX and TX flow control, felix listened to phylink (for the most part, at least - see the 2500base-X comment). The registers which only felix_phylink_mac_link_up wrote are: SYS_PAUSE_CFG_PAUSE_ENA - this is why I am not sure that flow control worked on ocelot. Not it should, since the code is shared with felix where it does. ANA_PORT_PORT_CFG - this is a Frame Analyzer block register, phylink should be the one touching them, deleted. Other changes: - The old phylib registration code was in mscc_ocelot_init_ports. It is hard to work with 2 levels of indentation already in, and with hard to follow teardown logic. The new phylink registration code was moved inside ocelot_probe_port(), right between alloc_etherdev() and register_netdev(). It could not be done before (=> outside of) ocelot_probe_port() because ocelot_probe_port() allocates the struct ocelot_port which we then use to assign ocelot_port->phy_mode to. It is more preferable to me to have all PHY handling logic inside the same function. - On the same topic: struct ocelot_port_private :: serdes is only used in ocelot_port_open to set the SERDES protocol to Ethernet. This is logically a runtime invariant and can be done just once, when the port registers with phylink. We therefore don't even need to keep the serdes reference inside struct ocelot_port_private, or to use the devm variant of of_phy_get(). - Phylink needs a valid phy-mode for phylink_create() to succeed, and the existing device tree bindings in arch/mips/boot/dts/mscc/ocelot_pcb120.dts don't define one for the internal PHY ports. So we patch PHY_INTERFACE_MODE_NA into PHY_INTERFACE_MODE_INTERNAL. - There was a strategically placed: switch (priv->phy_mode) { case PHY_INTERFACE_MODE_NA: continue; which made the code skip the serdes initialization for the internal PHY ports. Frankly that is not all that obvious, so now we explicitly initialize the serdes under an "if" condition and not rely on code jumps, so everything is clearer. - There was a write of OCELOT_SPEED_1000 to DEV_CLOCK_CFG for QSGMII ports. Since that is in fact the default value for the register field DEV_CLOCK_CFG_LINK_SPEED, I can only guess the intention was to clear the adjacent fields, MAC_TX_RST and MAC_RX_RST, aka take the port out of reset, which does match the comment. I don't even want to know why this code is placed there, but if there is indeed an issue that all ports that share a QSGMII lane must all be up, then this logic is already buggy, since mscc_ocelot_init_ports iterates using for_each_available_child_of_node, so nobody prevents the user from putting a 'status = "disabled";' for some QSGMII ports which would break the driver's assumption. In any case, in the eventuality that I'm right, we would have yet another issue if ocelot_phylink_mac_link_down would reset those ports and that would be forbidden, so since the ocelot_adjust_link logic did not do that (maybe for a reason), add another quirk to preserve the old logic. The ocelot driver teardown goes through all ports in one fell swoop. When initialization of one port fails, the ocelot->ports[port] pointer for that is reset to NULL, and teardown is done only for non-NULL ports, so there is no reason to do partial teardowns, let the central mscc_ocelot_release_ports() do its job. Tested bind, unbind, rebind, link up, link down, speed change on mock-up hardware (modified the driver to probe on Felix VSC9959). Also regression tested the felix DSA driver. Could not test the Ocelot specific bits (PCS1G, SERDES, device tree bindings). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-15 09:47:48 +08:00
ocelot_phylink_mac_link_up(ocelot, port, phydev, link_an_mode,
interface, speed, duplex, tx_pause, rx_pause,
FELIX_MAC_QUIRKS);
if (felix->info->port_sched_speed_set)
felix->info->port_sched_speed_set(ocelot, port, speed);
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
}
static void felix_port_qos_map_init(struct ocelot *ocelot, int port)
{
int i;
ocelot_rmw_gix(ocelot,
ANA_PORT_QOS_CFG_QOS_PCP_ENA,
ANA_PORT_QOS_CFG_QOS_PCP_ENA,
ANA_PORT_QOS_CFG,
port);
for (i = 0; i < OCELOT_NUM_TC * 2; i++) {
ocelot_rmw_ix(ocelot,
(ANA_PORT_PCP_DEI_MAP_DP_PCP_DEI_VAL & i) |
ANA_PORT_PCP_DEI_MAP_QOS_PCP_DEI_VAL(i),
ANA_PORT_PCP_DEI_MAP_DP_PCP_DEI_VAL |
ANA_PORT_PCP_DEI_MAP_QOS_PCP_DEI_VAL_M,
ANA_PORT_PCP_DEI_MAP,
port, i);
}
}
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
static void felix_get_strings(struct dsa_switch *ds, int port,
u32 stringset, u8 *data)
{
struct ocelot *ocelot = ds->priv;
return ocelot_get_strings(ocelot, port, stringset, data);
}
static void felix_get_ethtool_stats(struct dsa_switch *ds, int port, u64 *data)
{
struct ocelot *ocelot = ds->priv;
ocelot_get_ethtool_stats(ocelot, port, data);
}
static int felix_get_sset_count(struct dsa_switch *ds, int port, int sset)
{
struct ocelot *ocelot = ds->priv;
return ocelot_get_sset_count(ocelot, port, sset);
}
static int felix_get_ts_info(struct dsa_switch *ds, int port,
struct ethtool_ts_info *info)
{
struct ocelot *ocelot = ds->priv;
return ocelot_get_ts_info(ocelot, port, info);
}
static const u32 felix_phy_match_table[PHY_INTERFACE_MODE_MAX] = {
[PHY_INTERFACE_MODE_INTERNAL] = OCELOT_PORT_MODE_INTERNAL,
[PHY_INTERFACE_MODE_SGMII] = OCELOT_PORT_MODE_SGMII,
[PHY_INTERFACE_MODE_QSGMII] = OCELOT_PORT_MODE_QSGMII,
[PHY_INTERFACE_MODE_USXGMII] = OCELOT_PORT_MODE_USXGMII,
[PHY_INTERFACE_MODE_1000BASEX] = OCELOT_PORT_MODE_1000BASEX,
[PHY_INTERFACE_MODE_2500BASEX] = OCELOT_PORT_MODE_2500BASEX,
};
static int felix_validate_phy_mode(struct felix *felix, int port,
phy_interface_t phy_mode)
{
u32 modes = felix->info->port_modes[port];
if (felix_phy_match_table[phy_mode] & modes)
return 0;
return -EOPNOTSUPP;
}
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
static int felix_parse_ports_node(struct felix *felix,
struct device_node *ports_node,
phy_interface_t *port_phy_modes)
{
struct device *dev = felix->ocelot.dev;
struct device_node *child;
for_each_available_child_of_node(ports_node, child) {
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
phy_interface_t phy_mode;
u32 port;
int err;
/* Get switch port number from DT */
if (of_property_read_u32(child, "reg", &port) < 0) {
dev_err(dev, "Port number not defined in device tree "
"(property \"reg\")\n");
of_node_put(child);
return -ENODEV;
}
/* Get PHY mode from DT */
err = of_get_phy_mode(child, &phy_mode);
if (err) {
dev_err(dev, "Failed to read phy-mode or "
"phy-interface-type property for port %d\n",
port);
of_node_put(child);
return -ENODEV;
}
err = felix_validate_phy_mode(felix, port, phy_mode);
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
if (err < 0) {
dev_err(dev, "Unsupported PHY mode %s on port %d\n",
phy_modes(phy_mode), port);
of_node_put(child);
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
return err;
}
port_phy_modes[port] = phy_mode;
}
return 0;
}
static int felix_parse_dt(struct felix *felix, phy_interface_t *port_phy_modes)
{
struct device *dev = felix->ocelot.dev;
struct device_node *switch_node;
struct device_node *ports_node;
int err;
switch_node = dev->of_node;
ports_node = of_get_child_by_name(switch_node, "ports");
if (!ports_node)
ports_node = of_get_child_by_name(switch_node, "ethernet-ports");
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
if (!ports_node) {
dev_err(dev, "Incorrect bindings: absent \"ports\" or \"ethernet-ports\" node\n");
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
return -ENODEV;
}
err = felix_parse_ports_node(felix, ports_node, port_phy_modes);
of_node_put(ports_node);
return err;
}
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
static int felix_init_structs(struct felix *felix, int num_phys_ports)
{
struct ocelot *ocelot = &felix->ocelot;
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
phy_interface_t *port_phy_modes;
struct resource res;
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
int port, i, err;
ocelot->num_phys_ports = num_phys_ports;
ocelot->ports = devm_kcalloc(ocelot->dev, num_phys_ports,
sizeof(struct ocelot_port *), GFP_KERNEL);
if (!ocelot->ports)
return -ENOMEM;
ocelot->map = felix->info->map;
ocelot->stats_layout = felix->info->stats_layout;
ocelot->num_mact_rows = felix->info->num_mact_rows;
ocelot->vcap = felix->info->vcap;
ocelot->vcap_pol.base = felix->info->vcap_pol_base;
ocelot->vcap_pol.max = felix->info->vcap_pol_max;
ocelot->vcap_pol.base2 = felix->info->vcap_pol_base2;
ocelot->vcap_pol.max2 = felix->info->vcap_pol_max2;
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
ocelot->ops = felix->info->ops;
ocelot->npi_inj_prefix = OCELOT_TAG_PREFIX_SHORT;
ocelot->npi_xtr_prefix = OCELOT_TAG_PREFIX_SHORT;
net: mscc: ocelot: configure watermarks using devlink-sb Using devlink-sb, we can configure 12/16 (the important 75%) of the switch's controlling watermarks for congestion drops, and we can monitor 50% of the watermark occupancies (we can monitor the reservation watermarks, but not the sharing watermarks, which are exposed as pool sizes). The following definitions can be made: SB_BUF=0 # The devlink-sb for frame buffers SB_REF=1 # The devlink-sb for frame references POOL_ING=0 # The pool for ingress traffic. Both devlink-sb instances # have one of these. POOL_EGR=1 # The pool for egress traffic. Both devlink-sb instances # have one of these. Editing the hardware watermarks is done in the following way: BUF_xxxx_I is accessed when sb=$SB_BUF and pool=$POOL_ING REF_xxxx_I is accessed when sb=$SB_REF and pool=$POOL_ING BUF_xxxx_E is accessed when sb=$SB_BUF and pool=$POOL_EGR REF_xxxx_E is accessed when sb=$SB_REF and pool=$POOL_EGR Configuring the sharing watermarks for COL_SHR(dp=0) is done implicitly by modifying the corresponding pool size. By default, the pool size has maximum size, so this can be skipped. devlink sb pool set pci/0000:00:00.5 sb $SB_BUF pool $POOL_ING \ size 129840 thtype static Since by default there is no buffer reservation, the above command has maxed out BUF_COL_SHR_I(dp=0). Configuring the per-port reservation watermark (P_RSRV) is done in the following way: devlink sb port pool set pci/0000:00:00.5/0 sb $SB_BUF \ pool $POOL_ING th 1000 The above command sets BUF_P_RSRV_I(port 0) to 1000 bytes. After this command, the sharing watermarks are internally reconfigured with 1000 bytes less, i.e. from 129840 bytes to 128840 bytes. Configuring the per-port-tc reservation watermarks (Q_RSRV) is done in the following way: for tc in {0..7}; do devlink sb tc bind set pci/0000:00:00.5/0 sb 0 tc $tc \ type ingress pool $POOL_ING \ th 3000 done The above command sets BUF_Q_RSRV_I(port 0, tc 0..7) to 3000 bytes. The sharing watermarks are again reconfigured with 24000 bytes less. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 10:11:20 +08:00
ocelot->devlink = felix->ds->devlink;
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
port_phy_modes = kcalloc(num_phys_ports, sizeof(phy_interface_t),
GFP_KERNEL);
if (!port_phy_modes)
return -ENOMEM;
err = felix_parse_dt(felix, port_phy_modes);
if (err) {
kfree(port_phy_modes);
return err;
}
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
for (i = 0; i < TARGET_MAX; i++) {
struct regmap *target;
if (!felix->info->target_io_res[i].name)
continue;
memcpy(&res, &felix->info->target_io_res[i], sizeof(res));
res.flags = IORESOURCE_MEM;
res.start += felix->switch_base;
res.end += felix->switch_base;
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
target = felix->info->init_regmap(ocelot, &res);
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
if (IS_ERR(target)) {
dev_err(ocelot->dev,
"Failed to map device memory space\n");
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
kfree(port_phy_modes);
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
return PTR_ERR(target);
}
ocelot->targets[i] = target;
}
err = ocelot_regfields_init(ocelot, felix->info->regfields);
if (err) {
dev_err(ocelot->dev, "failed to init reg fields map\n");
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
kfree(port_phy_modes);
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
return err;
}
for (port = 0; port < num_phys_ports; port++) {
struct ocelot_port *ocelot_port;
struct regmap *target;
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
ocelot_port = devm_kzalloc(ocelot->dev,
sizeof(struct ocelot_port),
GFP_KERNEL);
if (!ocelot_port) {
dev_err(ocelot->dev,
"failed to allocate port memory\n");
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
kfree(port_phy_modes);
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
return -ENOMEM;
}
memcpy(&res, &felix->info->port_io_res[port], sizeof(res));
res.flags = IORESOURCE_MEM;
res.start += felix->switch_base;
res.end += felix->switch_base;
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
target = felix->info->init_regmap(ocelot, &res);
if (IS_ERR(target)) {
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
dev_err(ocelot->dev,
"Failed to map memory space for port %d\n",
port);
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
kfree(port_phy_modes);
return PTR_ERR(target);
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
}
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
ocelot_port->phy_mode = port_phy_modes[port];
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
ocelot_port->ocelot = ocelot;
ocelot_port->target = target;
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
ocelot->ports[port] = ocelot_port;
}
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
kfree(port_phy_modes);
if (felix->info->mdio_bus_alloc) {
err = felix->info->mdio_bus_alloc(ocelot);
if (err < 0)
return err;
}
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
return 0;
}
net: dsa: felix: purge skb from TX timestamping queue if it cannot be sent At present, when a PTP packet which requires TX timestamping gets dropped under congestion by the switch, things go downhill very fast. The driver keeps a clone of that skb in a queue of packets awaiting TX timestamp interrupts, but interrupts will never be raised for the dropped packets. Moreover, matching timestamped packets to timestamps is done by a 2-bit timestamp ID, and this can wrap around and we can match on the wrong skb. Since with the default NPI-based tagging protocol, we get no notification about packet drops, the best we can do is eventually recover from the drop of a PTP frame: its skb will be dead memory until another skb which was assigned the same timestamp ID happens to find it. However, with the ocelot-8021q tagger which injects packets using the manual register interface, it appears that we can check for more information, such as: - whether the input queue has reached the high watermark or not - whether the injection group's FIFO can accept additional data or not so we know that a PTP frame is likely to get dropped before actually sending it, and drop it ourselves (because DSA uses NETIF_F_LLTX, so it can't return NETDEV_TX_BUSY to ask the qdisc to requeue the packet). But when we do that, we can also remove the skb from the timestamping queue, because there surely won't be any timestamp that matches it. Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 19:40:42 +08:00
static void ocelot_port_purge_txtstamp_skb(struct ocelot *ocelot, int port,
struct sk_buff *skb)
{
struct ocelot_port *ocelot_port = ocelot->ports[port];
struct sk_buff *clone = OCELOT_SKB_CB(skb)->clone;
struct sk_buff *skb_match = NULL, *skb_tmp;
unsigned long flags;
if (!clone)
return;
spin_lock_irqsave(&ocelot_port->tx_skbs.lock, flags);
skb_queue_walk_safe(&ocelot_port->tx_skbs, skb, skb_tmp) {
if (skb != clone)
continue;
__skb_unlink(skb, &ocelot_port->tx_skbs);
skb_match = skb;
break;
}
spin_unlock_irqrestore(&ocelot_port->tx_skbs.lock, flags);
WARN_ONCE(!skb_match,
"Could not find skb clone in TX timestamping list\n");
}
net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib Michael reported that when using the "ocelot-8021q" tagging protocol, the switch driver module must be manually loaded before the tagging protocol can be loaded/is available. This appears to be the same problem described here: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ where due to the fact that DSA tagging protocols make use of symbols exported by the switch drivers, circular dependencies appear and this breaks module autoloading. The ocelot_8021q driver needs the ocelot_can_inject() and ocelot_port_inject_frame() functions from the switch library. Previously the wrong approach was taken to solve that dependency: shims were provided for the case where the ocelot switch library was compiled out, but that turns out to be insufficient, because the dependency when the switch lib _is_ compiled is problematic too. We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as static inline functions, because these access I/O functions like __ocelot_write_ix() which is called by ocelot_write_rix(). Making those static inline basically means exposing the whole guts of the ocelot switch library, not ideal... We already have one tagging protocol driver which calls into the switch driver during xmit but not using any exported symbol: sja1105_defer_xmit. We can do the same thing here: create a kthread worker and one work item per skb, and let the switch driver itself do the register accesses to send the skb, and then consume it. Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Reported-by: Michael Walle <michael@walle.cc> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 19:40:41 +08:00
#define work_to_xmit_work(w) \
container_of((w), struct felix_deferred_xmit_work, work)
static void felix_port_deferred_xmit(struct kthread_work *work)
{
struct felix_deferred_xmit_work *xmit_work = work_to_xmit_work(work);
struct dsa_switch *ds = xmit_work->dp->ds;
struct sk_buff *skb = xmit_work->skb;
u32 rew_op = ocelot_ptp_rew_op(skb);
struct ocelot *ocelot = ds->priv;
int port = xmit_work->dp->index;
int retries = 10;
do {
if (ocelot_can_inject(ocelot, 0))
break;
cpu_relax();
} while (--retries);
if (!retries) {
dev_err(ocelot->dev, "port %d failed to inject skb\n",
port);
net: dsa: felix: purge skb from TX timestamping queue if it cannot be sent At present, when a PTP packet which requires TX timestamping gets dropped under congestion by the switch, things go downhill very fast. The driver keeps a clone of that skb in a queue of packets awaiting TX timestamp interrupts, but interrupts will never be raised for the dropped packets. Moreover, matching timestamped packets to timestamps is done by a 2-bit timestamp ID, and this can wrap around and we can match on the wrong skb. Since with the default NPI-based tagging protocol, we get no notification about packet drops, the best we can do is eventually recover from the drop of a PTP frame: its skb will be dead memory until another skb which was assigned the same timestamp ID happens to find it. However, with the ocelot-8021q tagger which injects packets using the manual register interface, it appears that we can check for more information, such as: - whether the input queue has reached the high watermark or not - whether the injection group's FIFO can accept additional data or not so we know that a PTP frame is likely to get dropped before actually sending it, and drop it ourselves (because DSA uses NETIF_F_LLTX, so it can't return NETDEV_TX_BUSY to ask the qdisc to requeue the packet). But when we do that, we can also remove the skb from the timestamping queue, because there surely won't be any timestamp that matches it. Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 19:40:42 +08:00
ocelot_port_purge_txtstamp_skb(ocelot, port, skb);
net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib Michael reported that when using the "ocelot-8021q" tagging protocol, the switch driver module must be manually loaded before the tagging protocol can be loaded/is available. This appears to be the same problem described here: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ where due to the fact that DSA tagging protocols make use of symbols exported by the switch drivers, circular dependencies appear and this breaks module autoloading. The ocelot_8021q driver needs the ocelot_can_inject() and ocelot_port_inject_frame() functions from the switch library. Previously the wrong approach was taken to solve that dependency: shims were provided for the case where the ocelot switch library was compiled out, but that turns out to be insufficient, because the dependency when the switch lib _is_ compiled is problematic too. We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as static inline functions, because these access I/O functions like __ocelot_write_ix() which is called by ocelot_write_rix(). Making those static inline basically means exposing the whole guts of the ocelot switch library, not ideal... We already have one tagging protocol driver which calls into the switch driver during xmit but not using any exported symbol: sja1105_defer_xmit. We can do the same thing here: create a kthread worker and one work item per skb, and let the switch driver itself do the register accesses to send the skb, and then consume it. Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Reported-by: Michael Walle <michael@walle.cc> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 19:40:41 +08:00
kfree_skb(skb);
return;
}
ocelot_port_inject_frame(ocelot, port, 0, rew_op, skb);
consume_skb(skb);
kfree(xmit_work);
}
static int felix_connect_tag_protocol(struct dsa_switch *ds,
enum dsa_tag_protocol proto)
net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib Michael reported that when using the "ocelot-8021q" tagging protocol, the switch driver module must be manually loaded before the tagging protocol can be loaded/is available. This appears to be the same problem described here: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ where due to the fact that DSA tagging protocols make use of symbols exported by the switch drivers, circular dependencies appear and this breaks module autoloading. The ocelot_8021q driver needs the ocelot_can_inject() and ocelot_port_inject_frame() functions from the switch library. Previously the wrong approach was taken to solve that dependency: shims were provided for the case where the ocelot switch library was compiled out, but that turns out to be insufficient, because the dependency when the switch lib _is_ compiled is problematic too. We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as static inline functions, because these access I/O functions like __ocelot_write_ix() which is called by ocelot_write_rix(). Making those static inline basically means exposing the whole guts of the ocelot switch library, not ideal... We already have one tagging protocol driver which calls into the switch driver during xmit but not using any exported symbol: sja1105_defer_xmit. We can do the same thing here: create a kthread worker and one work item per skb, and let the switch driver itself do the register accesses to send the skb, and then consume it. Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Reported-by: Michael Walle <michael@walle.cc> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 19:40:41 +08:00
{
struct ocelot_8021q_tagger_data *tagger_data;
net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib Michael reported that when using the "ocelot-8021q" tagging protocol, the switch driver module must be manually loaded before the tagging protocol can be loaded/is available. This appears to be the same problem described here: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ where due to the fact that DSA tagging protocols make use of symbols exported by the switch drivers, circular dependencies appear and this breaks module autoloading. The ocelot_8021q driver needs the ocelot_can_inject() and ocelot_port_inject_frame() functions from the switch library. Previously the wrong approach was taken to solve that dependency: shims were provided for the case where the ocelot switch library was compiled out, but that turns out to be insufficient, because the dependency when the switch lib _is_ compiled is problematic too. We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as static inline functions, because these access I/O functions like __ocelot_write_ix() which is called by ocelot_write_rix(). Making those static inline basically means exposing the whole guts of the ocelot switch library, not ideal... We already have one tagging protocol driver which calls into the switch driver during xmit but not using any exported symbol: sja1105_defer_xmit. We can do the same thing here: create a kthread worker and one work item per skb, and let the switch driver itself do the register accesses to send the skb, and then consume it. Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Reported-by: Michael Walle <michael@walle.cc> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 19:40:41 +08:00
switch (proto) {
case DSA_TAG_PROTO_OCELOT_8021Q:
tagger_data = ocelot_8021q_tagger_data(ds);
tagger_data->xmit_work_fn = felix_port_deferred_xmit;
net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib Michael reported that when using the "ocelot-8021q" tagging protocol, the switch driver module must be manually loaded before the tagging protocol can be loaded/is available. This appears to be the same problem described here: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ where due to the fact that DSA tagging protocols make use of symbols exported by the switch drivers, circular dependencies appear and this breaks module autoloading. The ocelot_8021q driver needs the ocelot_can_inject() and ocelot_port_inject_frame() functions from the switch library. Previously the wrong approach was taken to solve that dependency: shims were provided for the case where the ocelot switch library was compiled out, but that turns out to be insufficient, because the dependency when the switch lib _is_ compiled is problematic too. We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as static inline functions, because these access I/O functions like __ocelot_write_ix() which is called by ocelot_write_rix(). Making those static inline basically means exposing the whole guts of the ocelot switch library, not ideal... We already have one tagging protocol driver which calls into the switch driver during xmit but not using any exported symbol: sja1105_defer_xmit. We can do the same thing here: create a kthread worker and one work item per skb, and let the switch driver itself do the register accesses to send the skb, and then consume it. Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Reported-by: Michael Walle <michael@walle.cc> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 19:40:41 +08:00
return 0;
case DSA_TAG_PROTO_OCELOT:
case DSA_TAG_PROTO_SEVILLE:
return 0;
default:
return -EPROTONOSUPPORT;
}
net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib Michael reported that when using the "ocelot-8021q" tagging protocol, the switch driver module must be manually loaded before the tagging protocol can be loaded/is available. This appears to be the same problem described here: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ where due to the fact that DSA tagging protocols make use of symbols exported by the switch drivers, circular dependencies appear and this breaks module autoloading. The ocelot_8021q driver needs the ocelot_can_inject() and ocelot_port_inject_frame() functions from the switch library. Previously the wrong approach was taken to solve that dependency: shims were provided for the case where the ocelot switch library was compiled out, but that turns out to be insufficient, because the dependency when the switch lib _is_ compiled is problematic too. We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as static inline functions, because these access I/O functions like __ocelot_write_ix() which is called by ocelot_write_rix(). Making those static inline basically means exposing the whole guts of the ocelot switch library, not ideal... We already have one tagging protocol driver which calls into the switch driver during xmit but not using any exported symbol: sja1105_defer_xmit. We can do the same thing here: create a kthread worker and one work item per skb, and let the switch driver itself do the register accesses to send the skb, and then consume it. Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Reported-by: Michael Walle <michael@walle.cc> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 19:40:41 +08:00
}
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
/* Hardware initialization done here so that we can allocate structures with
* devm without fear of dsa_register_switch returning -EPROBE_DEFER and causing
* us to allocate structures twice (leak memory) and map PCI memory twice
* (which will not work).
*/
static int felix_setup(struct dsa_switch *ds)
{
struct ocelot *ocelot = ds->priv;
struct felix *felix = ocelot_to_felix(ocelot);
net: dsa: felix: actually disable flooding towards NPI port The two blamed commits were written/tested individually but not together. When put together, commit 90897569beb1 ("net: dsa: felix: start off with flooding disabled on the CPU port"), which deletes a reinitialization of PGID_UC/PGID_MC/PGID_BC, is no longer sufficient to ensure that these port masks don't contain the CPU port module. This is because commit b903a6bd2e19 ("net: dsa: felix: migrate flood settings from NPI to tag_8021q CPU port") overwrites the hardware default settings towards the CPU port module with the settings that used to be present on the NPI port treated as a regular port. There, flooding is enabled, so flooding would get enabled on the CPU port module too. Adding conditional logic somewhere within felix_setup_tag_npi() to configure either the default no-flood policy or the flood policy inherited from the tag_8021q CPU port from a previous call to dsa_port_manage_cpu_flood() is getting complicated. So just let the migration logic do its thing during initial setup (which will temporarily turn on flooding), then turn flooding off for the NPI port after felix_set_tag_protocol() finishes. Here we are in felix_setup(), so the DSA slave interfaces are not yet created, and this doesn't affect traffic in any way. Fixes: 90897569beb1 ("net: dsa: felix: start off with flooding disabled on the CPU port") Fixes: b903a6bd2e19 ("net: dsa: felix: migrate flood settings from NPI to tag_8021q CPU port") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-08 17:15:14 +08:00
unsigned long cpu_flood;
struct dsa_port *dp;
int err;
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
err = felix_init_structs(felix, ds->num_ports);
if (err)
return err;
err = ocelot_init(ocelot);
if (err)
goto out_mdiobus_free;
if (ocelot->ptp) {
err = ocelot_init_timestamp(ocelot, felix->info->ptp_caps);
if (err) {
dev_err(ocelot->dev,
"Timestamp initialization failed\n");
ocelot->ptp = 0;
}
}
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
dsa_switch_for_each_available_port(dp, ds) {
ocelot_init_port(ocelot, dp->index);
/* Set the default QoS Classification based on PCP and DEI
* bits of vlan tag.
*/
felix_port_qos_map_init(ocelot, dp->index);
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
}
net: mscc: ocelot: configure watermarks using devlink-sb Using devlink-sb, we can configure 12/16 (the important 75%) of the switch's controlling watermarks for congestion drops, and we can monitor 50% of the watermark occupancies (we can monitor the reservation watermarks, but not the sharing watermarks, which are exposed as pool sizes). The following definitions can be made: SB_BUF=0 # The devlink-sb for frame buffers SB_REF=1 # The devlink-sb for frame references POOL_ING=0 # The pool for ingress traffic. Both devlink-sb instances # have one of these. POOL_EGR=1 # The pool for egress traffic. Both devlink-sb instances # have one of these. Editing the hardware watermarks is done in the following way: BUF_xxxx_I is accessed when sb=$SB_BUF and pool=$POOL_ING REF_xxxx_I is accessed when sb=$SB_REF and pool=$POOL_ING BUF_xxxx_E is accessed when sb=$SB_BUF and pool=$POOL_EGR REF_xxxx_E is accessed when sb=$SB_REF and pool=$POOL_EGR Configuring the sharing watermarks for COL_SHR(dp=0) is done implicitly by modifying the corresponding pool size. By default, the pool size has maximum size, so this can be skipped. devlink sb pool set pci/0000:00:00.5 sb $SB_BUF pool $POOL_ING \ size 129840 thtype static Since by default there is no buffer reservation, the above command has maxed out BUF_COL_SHR_I(dp=0). Configuring the per-port reservation watermark (P_RSRV) is done in the following way: devlink sb port pool set pci/0000:00:00.5/0 sb $SB_BUF \ pool $POOL_ING th 1000 The above command sets BUF_P_RSRV_I(port 0) to 1000 bytes. After this command, the sharing watermarks are internally reconfigured with 1000 bytes less, i.e. from 129840 bytes to 128840 bytes. Configuring the per-port-tc reservation watermarks (Q_RSRV) is done in the following way: for tc in {0..7}; do devlink sb tc bind set pci/0000:00:00.5/0 sb 0 tc $tc \ type ingress pool $POOL_ING \ th 3000 done The above command sets BUF_Q_RSRV_I(port 0, tc 0..7) to 3000 bytes. The sharing watermarks are again reconfigured with 24000 bytes less. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 10:11:20 +08:00
err = ocelot_devlink_sb_register(ocelot);
if (err)
goto out_deinit_ports;
net: mscc: ocelot: configure watermarks using devlink-sb Using devlink-sb, we can configure 12/16 (the important 75%) of the switch's controlling watermarks for congestion drops, and we can monitor 50% of the watermark occupancies (we can monitor the reservation watermarks, but not the sharing watermarks, which are exposed as pool sizes). The following definitions can be made: SB_BUF=0 # The devlink-sb for frame buffers SB_REF=1 # The devlink-sb for frame references POOL_ING=0 # The pool for ingress traffic. Both devlink-sb instances # have one of these. POOL_EGR=1 # The pool for egress traffic. Both devlink-sb instances # have one of these. Editing the hardware watermarks is done in the following way: BUF_xxxx_I is accessed when sb=$SB_BUF and pool=$POOL_ING REF_xxxx_I is accessed when sb=$SB_REF and pool=$POOL_ING BUF_xxxx_E is accessed when sb=$SB_BUF and pool=$POOL_EGR REF_xxxx_E is accessed when sb=$SB_REF and pool=$POOL_EGR Configuring the sharing watermarks for COL_SHR(dp=0) is done implicitly by modifying the corresponding pool size. By default, the pool size has maximum size, so this can be skipped. devlink sb pool set pci/0000:00:00.5 sb $SB_BUF pool $POOL_ING \ size 129840 thtype static Since by default there is no buffer reservation, the above command has maxed out BUF_COL_SHR_I(dp=0). Configuring the per-port reservation watermark (P_RSRV) is done in the following way: devlink sb port pool set pci/0000:00:00.5/0 sb $SB_BUF \ pool $POOL_ING th 1000 The above command sets BUF_P_RSRV_I(port 0) to 1000 bytes. After this command, the sharing watermarks are internally reconfigured with 1000 bytes less, i.e. from 129840 bytes to 128840 bytes. Configuring the per-port-tc reservation watermarks (Q_RSRV) is done in the following way: for tc in {0..7}; do devlink sb tc bind set pci/0000:00:00.5/0 sb 0 tc $tc \ type ingress pool $POOL_ING \ th 3000 done The above command sets BUF_Q_RSRV_I(port 0, tc 0..7) to 3000 bytes. The sharing watermarks are again reconfigured with 24000 bytes less. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 10:11:20 +08:00
dsa_switch_for_each_cpu_port(dp, ds) {
/* The initial tag protocol is NPI which always returns 0, so
* there's no real point in checking for errors.
*/
felix_set_tag_protocol(ds, dp->index, felix->tag_proto);
net: dsa: felix: actually disable flooding towards NPI port The two blamed commits were written/tested individually but not together. When put together, commit 90897569beb1 ("net: dsa: felix: start off with flooding disabled on the CPU port"), which deletes a reinitialization of PGID_UC/PGID_MC/PGID_BC, is no longer sufficient to ensure that these port masks don't contain the CPU port module. This is because commit b903a6bd2e19 ("net: dsa: felix: migrate flood settings from NPI to tag_8021q CPU port") overwrites the hardware default settings towards the CPU port module with the settings that used to be present on the NPI port treated as a regular port. There, flooding is enabled, so flooding would get enabled on the CPU port module too. Adding conditional logic somewhere within felix_setup_tag_npi() to configure either the default no-flood policy or the flood policy inherited from the tag_8021q CPU port from a previous call to dsa_port_manage_cpu_flood() is getting complicated. So just let the migration logic do its thing during initial setup (which will temporarily turn on flooding), then turn flooding off for the NPI port after felix_set_tag_protocol() finishes. Here we are in felix_setup(), so the DSA slave interfaces are not yet created, and this doesn't affect traffic in any way. Fixes: 90897569beb1 ("net: dsa: felix: start off with flooding disabled on the CPU port") Fixes: b903a6bd2e19 ("net: dsa: felix: migrate flood settings from NPI to tag_8021q CPU port") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-08 17:15:14 +08:00
/* Start off with flooding disabled towards the NPI port
* (actually CPU port module).
*/
cpu_flood = ANA_PGID_PGID_PGID(BIT(ocelot->num_phys_ports));
ocelot_rmw_rix(ocelot, 0, cpu_flood, ANA_PGID_PGID, PGID_UC);
ocelot_rmw_rix(ocelot, 0, cpu_flood, ANA_PGID_PGID, PGID_MC);
break;
}
net: dsa: felix: Allow unknown unicast traffic towards the CPU port module Compared to other DSA switches, in the Ocelot cores, the RX filtering is a much more important concern. Firstly, the primary use case for Ocelot is non-DSA, so there isn't any secondary Ethernet MAC [the DSA master's one] to implicitly drop frames having a DMAC we are not interested in. So the switch driver itself needs to install FDB entries towards the CPU port module (PGID_CPU) for the MAC address of each switch port, in each VLAN installed on the port. Every address that is not whitelisted is implicitly dropped. This is in order to achieve a behavior similar to N standalone net devices. Secondly, even in the secondary use case of DSA, such as illustrated by Felix with the NPI port mode, that secondary Ethernet MAC is present, but its RX filter is bypassed. This is because the DSA tags themselves are placed before Ethernet, so the DMAC that the switch ports see is not seen by the DSA master too (since it's shifter to the right). So RX filtering is pretty important. A good RX filter won't bother the CPU in case the switch port receives a frame that it's not interested in, and there exists no other line of defense. Ocelot is pretty strict when it comes to RX filtering: non-IP multicast and broadcast traffic is allowed to go to the CPU port module, but unknown unicast isn't. This means that traffic reception for any other MAC addresses than the ones configured on each switch port net device won't work. This includes use cases such as macvlan or bridging with a non-Ocelot (so-called "foreign") interface. But this seems to be fine for the scenarios that the Linux system embedded inside an Ocelot switch is intended for - it is simply not interested in unknown unicast traffic, as explained in Allan Nielsen's presentation [0]. On the other hand, the Felix DSA switch is integrated in more general-purpose Linux systems, so it can't afford to drop that sort of traffic in hardware, even if it will end up doing so later, in software. Actually, unknown unicast means more for Felix than it does for Ocelot. Felix doesn't attempt to perform the whitelisting of switch port MAC addresses towards PGID_CPU at all, mainly because it is too complicated to be feasible: while the MAC addresses are unique in Ocelot, by default in DSA all ports are equal and inherited from the DSA master. This adds into account the question of reference counting MAC addresses (delayed ocelot_mact_forget), not to mention reference counting for the VLAN IDs that those MAC addresses are installed in. This reference counting should be done in the DSA core, and the fact that it wasn't needed so far is due to the fact that the other DSA switches don't have the DSA tag placed before Ethernet, so the DSA master is able to whitelist the MAC addresses in hardware. So this means that even regular traffic termination on a Felix switch port happens through flooding (because neither Felix nor Ocelot learn source MAC addresses from CPU-injected frames). So far we've explained that whitelisting towards PGID_CPU: - helps to reduce the likelihood of spamming the CPU with frames it won't process very far anyway - is implemented in the ocelot driver - is sufficient for the ocelot use cases - is not feasible in DSA - breaks use cases in DSA, in the current status (whitelisting enabled but no MAC address whitelisted) So the proposed patch allows unknown unicast frames to be sent to the CPU port module. This is done for the Felix DSA driver only, as Ocelot seems to be happy without it. [0]: https://www.youtube.com/watch?v=B1HhxEcU7Jg Suggested-by: Allan W. Nielsen <allan.nielsen@microchip.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Allan W. Nielsen <allan.nielsen@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-29 22:50:03 +08:00
ds->mtu_enforcement_ingress = true;
ds->assisted_learning_on_cpu_port = true;
net: mscc: ocelot: enforce FDB isolation when VLAN-unaware Currently ocelot uses a pvid of 0 for standalone ports and ports under a VLAN-unaware bridge, and the pvid of the bridge for ports under a VLAN-aware bridge. Standalone ports do not perform learning, but packets received on them are still subject to FDB lookups. So if the MAC DA that a standalone port receives has been also learned on a VLAN-unaware bridge port, ocelot will attempt to forward to that port, even though it can't, so it will drop packets. So there is a desire to avoid that, and isolate the FDBs of different bridges from one another, and from standalone ports. The ocelot switch library has two distinct entry points: the felix DSA driver and the ocelot switchdev driver. We need to code up a minimal bridge_num allocation in the ocelot switchdev driver too, this is copied from DSA with the exception that ocelot does not care about DSA trees, cross-chip bridging etc. So it only looks at its own ports that are already in the same bridge. The ocelot switchdev driver uses the bridge_num it has allocated itself, while the felix driver uses the bridge_num allocated by DSA. They are both stored inside ocelot_port->bridge_num by the common function ocelot_port_bridge_join() which receives the bridge_num passed by value. Once we have a bridge_num, we can only use it to enforce isolation between VLAN-unaware bridges. As far as I can see, ocelot does not have anything like a FID that further makes VLAN 100 from a port be different to VLAN 100 from another port with regard to FDB lookup. So we simply deny multiple VLAN-aware bridges. For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we allocate a VLAN for each bridge_num. This will be used as the pvid of each port that is under that VLAN-unaware bridge, for as long as that bridge is VLAN-unaware. VID 0 remains only for standalone ports. It is okay if all standalone ports use the same VID 0, since they perform no address learning, the FDB will contain no entry in VLAN 0, so the packets will always be flooded to the only possible destination, the CPU port. The CPU port module doesn't need to be member of the VLANs to receive packets, but if we use the DSA tag_8021q protocol, those packets are part of the data plane as far as ocelot is concerned, so there it needs to. Just ensure that the DSA tag_8021q CPU port is a member of all reserved VLANs when it is created, and is removed when it is deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:25 +08:00
ds->fdb_isolation = true;
ds->max_num_bridges = ds->num_ports;
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
return 0;
out_deinit_ports:
dsa_switch_for_each_available_port(dp, ds)
ocelot_deinit_port(ocelot, dp->index);
ocelot_deinit_timestamp(ocelot);
ocelot_deinit(ocelot);
out_mdiobus_free:
if (felix->info->mdio_bus_free)
felix->info->mdio_bus_free(ocelot);
return err;
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
}
static void felix_teardown(struct dsa_switch *ds)
{
struct ocelot *ocelot = ds->priv;
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
struct felix *felix = ocelot_to_felix(ocelot);
struct dsa_port *dp;
net: dsa: felix: Add PCS operations for PHYLINK Layerscape SoCs traditionally expose the SerDes configuration/status for Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register format that is compatible with clause 22 or clause 45 (depending on SerDes protocol). Each MAC has its own internal MDIO bus on which there is one or more of these PCS's, responding to commands at a configurable PHY address. The per-port internal MDIO bus (which is just for PCSs) is totally separate and has nothing to do with the dedicated external MDIO controller (which is just for PHYs), but the register map for the MDIO controller is the same. The VSC9959 (Felix) switch instantiated in the LS1028A is integrated in hardware with the ENETC PCS of its DSA master, and reuses its MDIO controller driver, so Felix has been made to depend on it in Kconfig. +------------------------------------------------------------------------+ | +--------+ GMII (typically disabled via RCW) | | ENETC PCI | ENETC |--------------------------+ | | Root Complex | port 3 |-----------------------+ | | | Integrated +--------+ | | | | Endpoint | | | | +--------+ 2.5G GMII | | | | | ENETC |--------------+ | | | | | port 2 |-----------+ | | | | | +--------+ | | | | | | +--------+ +--------+ | | | Felix | | Felix | | | | port 4 | | port 5 | | | +--------+ +--------+ | | | | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | | | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | | | | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | | +------------------------------------------------------------------------+ | |||| SerDes | |||| |||| |||| |||| | | +--------+block | +--------------------------------------------+ | | | ENETC | | | ENETC port 2 internal MDIO bus | | | | port 0 | | | PCS PCS PCS PCS | | | | PCS | | | 0 1 2 3 | | +-----------------|------------------------------------------------------+ v v v v v v SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X USXGMII/ (bypasses 1000Base-X/ SerDes) 2500Base-X In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of the ENETC root complex, and has 2 BARs: - BAR 4: the switch's effective registers - BAR 0: the MDIO controller register map lended from ENETC port 2 (PF2), for accessing its associated PCS's. This explanation is necessary because the patch does some renaming "pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear a bit obtuse. The fact that the internal MDIO bus is "borrowed" is relevant because the register map is found in PF5 (the switch) but it triggers an access fault if PF2 (the ENETC DSA master) is not enabled. This is not treated in any way (and I don't think it can be treated). All of this is so SoC-specific, that it was contained as much as possible in the platform-integration file felix_vsc9959.c. We need to parse and pre-validate the device tree because of 2 reasons: - The PHY mode (SerDes protocol) cannot change at runtime due to SoC design. - There is a circular dependency in that we need to know what clause the PCS speaks in order to find it on the internal MDIO bus. But the clause of the PCS depends on what phy-mode it is configured for. The goal of this patch is to make steps towards removing the bootloader dependency for SGMII PCS pre-configuration, as well as to add support for monitoring the in-band SGMII AN between the PCS and the system-side link partner (PHY or other MAC). In practice the bootloader dependency is not completely removed. U-Boot pre-programs the PHY address at which each PCS can be found on the internal MDIO bus (MDEV_PORT). This is needed because the PCS of each port has the same out-of-reset PHY address of zero. The SerDes register for changing MDEV_PORT is pretty deep in the SoC (outside the addresses of the ENETC PCI BARs) and therefore inaccessible to us from here. Felix VSC9959 and Ocelot VSC7514 are integrated very differently in their respective SoCs, and for that reason Felix does not use the Ocelot core library for PHYLINK. On one hand we don't want to impose the fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't need to force the MAC link speed the way Ocelot does, since the MAC is connected to the PCS through a fixed GMII, and the PCS is the one who does the rate adaptation at lower link speeds, which the MAC does not even need to know about. In fact changing the GMII speed for Felix irrecoverably breaks transmission through that port until a reset. The pair with ENETC port 3 and Felix port 5 is optional and doesn't support tagging. When we enable it, swp5 is a regular slave port, albeit an internal one. The trouble is that it doesn't work, and that is because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave ports. So that is yet another reason for wanting to convert Felix to the native PHYLINK API. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 09:34:17 +08:00
dsa_switch_for_each_cpu_port(dp, ds) {
felix_del_tag_protocol(ds, dp->index, felix->tag_proto);
break;
}
dsa_switch_for_each_available_port(dp, ds)
ocelot_deinit_port(ocelot, dp->index);
net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib Michael reported that when using the "ocelot-8021q" tagging protocol, the switch driver module must be manually loaded before the tagging protocol can be loaded/is available. This appears to be the same problem described here: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ where due to the fact that DSA tagging protocols make use of symbols exported by the switch drivers, circular dependencies appear and this breaks module autoloading. The ocelot_8021q driver needs the ocelot_can_inject() and ocelot_port_inject_frame() functions from the switch library. Previously the wrong approach was taken to solve that dependency: shims were provided for the case where the ocelot switch library was compiled out, but that turns out to be insufficient, because the dependency when the switch lib _is_ compiled is problematic too. We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as static inline functions, because these access I/O functions like __ocelot_write_ix() which is called by ocelot_write_rix(). Making those static inline basically means exposing the whole guts of the ocelot switch library, not ideal... We already have one tagging protocol driver which calls into the switch driver during xmit but not using any exported symbol: sja1105_defer_xmit. We can do the same thing here: create a kthread worker and one work item per skb, and let the switch driver itself do the register accesses to send the skb, and then consume it. Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Reported-by: Michael Walle <michael@walle.cc> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 19:40:41 +08:00
ocelot_devlink_sb_unregister(ocelot);
ocelot_deinit_timestamp(ocelot);
ocelot_deinit(ocelot);
if (felix->info->mdio_bus_free)
felix->info->mdio_bus_free(ocelot);
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
}
static int felix_hwtstamp_get(struct dsa_switch *ds, int port,
struct ifreq *ifr)
{
struct ocelot *ocelot = ds->priv;
return ocelot_hwstamp_get(ocelot, port, ifr);
}
static int felix_hwtstamp_set(struct dsa_switch *ds, int port,
struct ifreq *ifr)
{
struct ocelot *ocelot = ds->priv;
net: dsa: felix: update destinations of existing traps with ocelot-8021q Historically, the felix DSA driver has installed special traps such that PTP over L2 works with the ocelot-8021q tagging protocol; commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") has the details. Then the ocelot switch library also gained more comprehensive support for PTP traps through commit 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets"). Right now, PTP over L2 works using ocelot-8021q via the traps it has set for itself, but nothing else does. Consolidating the two code blocks would make ocelot-8021q gain support for PTP over L4 and tc-flower traps, and at the same time avoid some code and TCAM duplication. The traps are similar in intent, but different in execution, so some explanation is required. The traps set up by felix_setup_mmio_filtering() are VCAP IS1 filters, which have a PAG that chains them to a VCAP IS2 filter, and the IS2 is where the 'trap' action resides. The traps set up by ocelot_trap_add(), on the other hand, have a single filter, in VCAP IS2. The reason for chaining VCAP IS1 and IS2 in Felix was to ensure that the hardcoded traps take precedence and cannot be overridden by the Ocelot switch library. So in principle, the PTP traps needed for ocelot-8021q in the Felix driver can rely on ocelot_trap_add(), but the filters need to be patched to account for a quirk that LS1028A has: the quirk_no_xtr_irq described in commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping"). Live-patching is done by iterating through the trap list every time we know it has been updated, and transforming a trap into a redirect + CPU copy if ocelot-8021q is in use. Making the DSA ocelot-8021q tagger work with the Ocelot traps means we can eliminate the dedicated OCELOT_VCAP_IS1_TAG_8021Q_PTP_MMIO and OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO cookies. To minimize the patch delta, OCELOT_VCAP_IS2_MRP_TRAP takes the place of OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO (the alternative would have been to left-shift all cookie numbers by 1). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16 22:30:13 +08:00
struct felix *felix = ocelot_to_felix(ocelot);
bool using_tag_8021q;
int err;
net: dsa: felix: update destinations of existing traps with ocelot-8021q Historically, the felix DSA driver has installed special traps such that PTP over L2 works with the ocelot-8021q tagging protocol; commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") has the details. Then the ocelot switch library also gained more comprehensive support for PTP traps through commit 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets"). Right now, PTP over L2 works using ocelot-8021q via the traps it has set for itself, but nothing else does. Consolidating the two code blocks would make ocelot-8021q gain support for PTP over L4 and tc-flower traps, and at the same time avoid some code and TCAM duplication. The traps are similar in intent, but different in execution, so some explanation is required. The traps set up by felix_setup_mmio_filtering() are VCAP IS1 filters, which have a PAG that chains them to a VCAP IS2 filter, and the IS2 is where the 'trap' action resides. The traps set up by ocelot_trap_add(), on the other hand, have a single filter, in VCAP IS2. The reason for chaining VCAP IS1 and IS2 in Felix was to ensure that the hardcoded traps take precedence and cannot be overridden by the Ocelot switch library. So in principle, the PTP traps needed for ocelot-8021q in the Felix driver can rely on ocelot_trap_add(), but the filters need to be patched to account for a quirk that LS1028A has: the quirk_no_xtr_irq described in commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping"). Live-patching is done by iterating through the trap list every time we know it has been updated, and transforming a trap into a redirect + CPU copy if ocelot-8021q is in use. Making the DSA ocelot-8021q tagger work with the Ocelot traps means we can eliminate the dedicated OCELOT_VCAP_IS1_TAG_8021Q_PTP_MMIO and OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO cookies. To minimize the patch delta, OCELOT_VCAP_IS2_MRP_TRAP takes the place of OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO (the alternative would have been to left-shift all cookie numbers by 1). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16 22:30:13 +08:00
err = ocelot_hwstamp_set(ocelot, port, ifr);
if (err)
return err;
using_tag_8021q = felix->tag_proto == DSA_TAG_PROTO_OCELOT_8021Q;
return felix_update_trapping_destinations(ds, using_tag_8021q);
}
static bool felix_check_xtr_pkt(struct ocelot *ocelot)
net: dsa: tag_ocelot_8021q: add support for PTP timestamping For TX timestamping, we use the felix_txtstamp method which is common with the regular (non-8021q) ocelot tagger. This method says that skb deferral is needed, prepares a timestamp request ID, and puts a clone of the skb in a queue waiting for the timestamp IRQ. felix_txtstamp is called by dsa_skb_tx_timestamp() just before the tagger's xmit method. In the tagger xmit, we divert the packets classified by dsa_skb_tx_timestamp() as PTP towards the MMIO-based injection registers, and we declare them as dead towards dsa_slave_xmit. If not PTP, we proceed with normal tag_8021q stuff. Then the timestamp IRQ fires, the clone queued up from felix_txtstamp is matched to the TX timestamp retrieved from the switch's FIFO based on the timestamp request ID, and the clone is delivered to the stack. On RX, thanks to the VCAP IS2 rule that redirects the frames with an EtherType for 1588 towards two destinations: - the CPU port module (for MMIO based extraction) and - if the "no XTR IRQ" workaround is in place, the dsa_8021q CPU port the relevant data path processing starts in the ptp_classify_raw BPF classifier installed by DSA in the RX data path (post tagger, which is completely unaware that it saw a PTP packet). This time we can't reuse the same implementation of .port_rxtstamp that also works with the default ocelot tagger. That is because felix_rxtstamp is given an skb with a freshly stripped DSA header, and it says "I don't need deferral for its RX timestamp, it's right in it, let me show you"; and it just points to the header right behind skb->data, from where it unpacks the timestamp and annotates the skb with it. The same thing cannot happen with tag_ocelot_8021q, because for one thing, the skb did not have an extraction frame header in the first place, but a VLAN tag with no timestamp information. So the code paths in felix_rxtstamp for the regular and 8021q tagger are completely independent. With tag_8021q, the timestamp must come from the packet's duplicate delivered to the CPU port module, but there is potentially complex logic to be handled [ and prone to reordering ] if we were to just start reading packets from the CPU port module, and try to match them to the one we received over Ethernet and which needs an RX timestamp. So we do something simple: we tell DSA "give me some time to think" (we request skb deferral by returning false from .port_rxtstamp) and we just drop the frame we got over Ethernet with no attempt to match it to anything - we just treat it as a notification that there's data to be processed from the CPU port module's queues. Then we proceed to read the packets from those, one by one, which we deliver up the stack, timestamped, using netif_rx - the same function that any driver would use anyway if it needed RX timestamp deferral. So the assumption is that we'll come across the PTP packet that triggered the CPU extraction notification eventually, but we don't know when exactly. Thanks to the VCAP IS2 trap/redirect rule and the exclusion of the CPU port module from the flooding replicators, only PTP frames should be present in the CPU port module's RX queues anyway. There is just one conflict between the VCAP IS2 trapping rule and the semantics of the BPF classifier. Namely, ptp_classify_raw() deems general messages as non-timestampable, but still, those are trapped to the CPU port module since they have an EtherType of ETH_P_1588. So, if the "no XTR IRQ" workaround is in place, we need to run another BPF classifier on the frames extracted over MMIO, to avoid duplicates being sent to the stack (once over Ethernet, once over MMIO). It doesn't look like it's possible to install VCAP IS2 rules based on keys extracted from the 1588 frame headers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 06:38:01 +08:00
{
struct felix *felix = ocelot_to_felix(ocelot);
int err = 0, grp = 0;
net: dsa: tag_ocelot_8021q: add support for PTP timestamping For TX timestamping, we use the felix_txtstamp method which is common with the regular (non-8021q) ocelot tagger. This method says that skb deferral is needed, prepares a timestamp request ID, and puts a clone of the skb in a queue waiting for the timestamp IRQ. felix_txtstamp is called by dsa_skb_tx_timestamp() just before the tagger's xmit method. In the tagger xmit, we divert the packets classified by dsa_skb_tx_timestamp() as PTP towards the MMIO-based injection registers, and we declare them as dead towards dsa_slave_xmit. If not PTP, we proceed with normal tag_8021q stuff. Then the timestamp IRQ fires, the clone queued up from felix_txtstamp is matched to the TX timestamp retrieved from the switch's FIFO based on the timestamp request ID, and the clone is delivered to the stack. On RX, thanks to the VCAP IS2 rule that redirects the frames with an EtherType for 1588 towards two destinations: - the CPU port module (for MMIO based extraction) and - if the "no XTR IRQ" workaround is in place, the dsa_8021q CPU port the relevant data path processing starts in the ptp_classify_raw BPF classifier installed by DSA in the RX data path (post tagger, which is completely unaware that it saw a PTP packet). This time we can't reuse the same implementation of .port_rxtstamp that also works with the default ocelot tagger. That is because felix_rxtstamp is given an skb with a freshly stripped DSA header, and it says "I don't need deferral for its RX timestamp, it's right in it, let me show you"; and it just points to the header right behind skb->data, from where it unpacks the timestamp and annotates the skb with it. The same thing cannot happen with tag_ocelot_8021q, because for one thing, the skb did not have an extraction frame header in the first place, but a VLAN tag with no timestamp information. So the code paths in felix_rxtstamp for the regular and 8021q tagger are completely independent. With tag_8021q, the timestamp must come from the packet's duplicate delivered to the CPU port module, but there is potentially complex logic to be handled [ and prone to reordering ] if we were to just start reading packets from the CPU port module, and try to match them to the one we received over Ethernet and which needs an RX timestamp. So we do something simple: we tell DSA "give me some time to think" (we request skb deferral by returning false from .port_rxtstamp) and we just drop the frame we got over Ethernet with no attempt to match it to anything - we just treat it as a notification that there's data to be processed from the CPU port module's queues. Then we proceed to read the packets from those, one by one, which we deliver up the stack, timestamped, using netif_rx - the same function that any driver would use anyway if it needed RX timestamp deferral. So the assumption is that we'll come across the PTP packet that triggered the CPU extraction notification eventually, but we don't know when exactly. Thanks to the VCAP IS2 trap/redirect rule and the exclusion of the CPU port module from the flooding replicators, only PTP frames should be present in the CPU port module's RX queues anyway. There is just one conflict between the VCAP IS2 trapping rule and the semantics of the BPF classifier. Namely, ptp_classify_raw() deems general messages as non-timestampable, but still, those are trapped to the CPU port module since they have an EtherType of ETH_P_1588. So, if the "no XTR IRQ" workaround is in place, we need to run another BPF classifier on the frames extracted over MMIO, to avoid duplicates being sent to the stack (once over Ethernet, once over MMIO). It doesn't look like it's possible to install VCAP IS2 rules based on keys extracted from the 1588 frame headers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 06:38:01 +08:00
if (felix->tag_proto != DSA_TAG_PROTO_OCELOT_8021Q)
return false;
if (!felix->info->quirk_no_xtr_irq)
return false;
while (ocelot_read(ocelot, QS_XTR_DATA_PRESENT) & BIT(grp)) {
struct sk_buff *skb;
unsigned int type;
err = ocelot_xtr_poll_frame(ocelot, grp, &skb);
if (err)
goto out;
/* We trap to the CPU port module all PTP frames, but
* felix_rxtstamp() only gets called for event frames.
* So we need to avoid sending duplicate general
* message frames by running a second BPF classifier
* here and dropping those.
*/
__skb_push(skb, ETH_HLEN);
type = ptp_classify_raw(skb);
__skb_pull(skb, ETH_HLEN);
if (type == PTP_CLASS_NONE) {
kfree_skb(skb);
continue;
}
netif_rx(skb);
}
out:
if (err < 0) {
dev_err_ratelimited(ocelot->dev,
"Error during packet extraction: %pe\n",
ERR_PTR(err));
net: dsa: tag_ocelot_8021q: add support for PTP timestamping For TX timestamping, we use the felix_txtstamp method which is common with the regular (non-8021q) ocelot tagger. This method says that skb deferral is needed, prepares a timestamp request ID, and puts a clone of the skb in a queue waiting for the timestamp IRQ. felix_txtstamp is called by dsa_skb_tx_timestamp() just before the tagger's xmit method. In the tagger xmit, we divert the packets classified by dsa_skb_tx_timestamp() as PTP towards the MMIO-based injection registers, and we declare them as dead towards dsa_slave_xmit. If not PTP, we proceed with normal tag_8021q stuff. Then the timestamp IRQ fires, the clone queued up from felix_txtstamp is matched to the TX timestamp retrieved from the switch's FIFO based on the timestamp request ID, and the clone is delivered to the stack. On RX, thanks to the VCAP IS2 rule that redirects the frames with an EtherType for 1588 towards two destinations: - the CPU port module (for MMIO based extraction) and - if the "no XTR IRQ" workaround is in place, the dsa_8021q CPU port the relevant data path processing starts in the ptp_classify_raw BPF classifier installed by DSA in the RX data path (post tagger, which is completely unaware that it saw a PTP packet). This time we can't reuse the same implementation of .port_rxtstamp that also works with the default ocelot tagger. That is because felix_rxtstamp is given an skb with a freshly stripped DSA header, and it says "I don't need deferral for its RX timestamp, it's right in it, let me show you"; and it just points to the header right behind skb->data, from where it unpacks the timestamp and annotates the skb with it. The same thing cannot happen with tag_ocelot_8021q, because for one thing, the skb did not have an extraction frame header in the first place, but a VLAN tag with no timestamp information. So the code paths in felix_rxtstamp for the regular and 8021q tagger are completely independent. With tag_8021q, the timestamp must come from the packet's duplicate delivered to the CPU port module, but there is potentially complex logic to be handled [ and prone to reordering ] if we were to just start reading packets from the CPU port module, and try to match them to the one we received over Ethernet and which needs an RX timestamp. So we do something simple: we tell DSA "give me some time to think" (we request skb deferral by returning false from .port_rxtstamp) and we just drop the frame we got over Ethernet with no attempt to match it to anything - we just treat it as a notification that there's data to be processed from the CPU port module's queues. Then we proceed to read the packets from those, one by one, which we deliver up the stack, timestamped, using netif_rx - the same function that any driver would use anyway if it needed RX timestamp deferral. So the assumption is that we'll come across the PTP packet that triggered the CPU extraction notification eventually, but we don't know when exactly. Thanks to the VCAP IS2 trap/redirect rule and the exclusion of the CPU port module from the flooding replicators, only PTP frames should be present in the CPU port module's RX queues anyway. There is just one conflict between the VCAP IS2 trapping rule and the semantics of the BPF classifier. Namely, ptp_classify_raw() deems general messages as non-timestampable, but still, those are trapped to the CPU port module since they have an EtherType of ETH_P_1588. So, if the "no XTR IRQ" workaround is in place, we need to run another BPF classifier on the frames extracted over MMIO, to avoid duplicates being sent to the stack (once over Ethernet, once over MMIO). It doesn't look like it's possible to install VCAP IS2 rules based on keys extracted from the 1588 frame headers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 06:38:01 +08:00
ocelot_drain_cpu_queue(ocelot, 0);
}
net: dsa: tag_ocelot_8021q: add support for PTP timestamping For TX timestamping, we use the felix_txtstamp method which is common with the regular (non-8021q) ocelot tagger. This method says that skb deferral is needed, prepares a timestamp request ID, and puts a clone of the skb in a queue waiting for the timestamp IRQ. felix_txtstamp is called by dsa_skb_tx_timestamp() just before the tagger's xmit method. In the tagger xmit, we divert the packets classified by dsa_skb_tx_timestamp() as PTP towards the MMIO-based injection registers, and we declare them as dead towards dsa_slave_xmit. If not PTP, we proceed with normal tag_8021q stuff. Then the timestamp IRQ fires, the clone queued up from felix_txtstamp is matched to the TX timestamp retrieved from the switch's FIFO based on the timestamp request ID, and the clone is delivered to the stack. On RX, thanks to the VCAP IS2 rule that redirects the frames with an EtherType for 1588 towards two destinations: - the CPU port module (for MMIO based extraction) and - if the "no XTR IRQ" workaround is in place, the dsa_8021q CPU port the relevant data path processing starts in the ptp_classify_raw BPF classifier installed by DSA in the RX data path (post tagger, which is completely unaware that it saw a PTP packet). This time we can't reuse the same implementation of .port_rxtstamp that also works with the default ocelot tagger. That is because felix_rxtstamp is given an skb with a freshly stripped DSA header, and it says "I don't need deferral for its RX timestamp, it's right in it, let me show you"; and it just points to the header right behind skb->data, from where it unpacks the timestamp and annotates the skb with it. The same thing cannot happen with tag_ocelot_8021q, because for one thing, the skb did not have an extraction frame header in the first place, but a VLAN tag with no timestamp information. So the code paths in felix_rxtstamp for the regular and 8021q tagger are completely independent. With tag_8021q, the timestamp must come from the packet's duplicate delivered to the CPU port module, but there is potentially complex logic to be handled [ and prone to reordering ] if we were to just start reading packets from the CPU port module, and try to match them to the one we received over Ethernet and which needs an RX timestamp. So we do something simple: we tell DSA "give me some time to think" (we request skb deferral by returning false from .port_rxtstamp) and we just drop the frame we got over Ethernet with no attempt to match it to anything - we just treat it as a notification that there's data to be processed from the CPU port module's queues. Then we proceed to read the packets from those, one by one, which we deliver up the stack, timestamped, using netif_rx - the same function that any driver would use anyway if it needed RX timestamp deferral. So the assumption is that we'll come across the PTP packet that triggered the CPU extraction notification eventually, but we don't know when exactly. Thanks to the VCAP IS2 trap/redirect rule and the exclusion of the CPU port module from the flooding replicators, only PTP frames should be present in the CPU port module's RX queues anyway. There is just one conflict between the VCAP IS2 trapping rule and the semantics of the BPF classifier. Namely, ptp_classify_raw() deems general messages as non-timestampable, but still, those are trapped to the CPU port module since they have an EtherType of ETH_P_1588. So, if the "no XTR IRQ" workaround is in place, we need to run another BPF classifier on the frames extracted over MMIO, to avoid duplicates being sent to the stack (once over Ethernet, once over MMIO). It doesn't look like it's possible to install VCAP IS2 rules based on keys extracted from the 1588 frame headers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 06:38:01 +08:00
return true;
}
static bool felix_rxtstamp(struct dsa_switch *ds, int port,
struct sk_buff *skb, unsigned int type)
{
net: dsa: felix: fix broken VLAN-tagged PTP under VLAN-aware bridge Normally it is expected that the dsa_device_ops :: rcv() method finishes parsing the DSA tag and consumes it, then never looks at it again. But commit c0bcf537667c ("net: dsa: ocelot: add hardware timestamping support for Felix") added support for RX timestamping in a very unconventional way. On this switch, a partial timestamp is available in the DSA header, but the driver got away with not parsing that timestamp right away, but instead delayed that parsing for a little longer: dsa_switch_rcv(): nskb = cpu_dp->rcv(skb, dev); <------------- not here -> ocelot_rcv() ... skb = nskb; skb_push(skb, ETH_HLEN); skb->pkt_type = PACKET_HOST; skb->protocol = eth_type_trans(skb, skb->dev); ... if (dsa_skb_defer_rx_timestamp(p, skb)) <--- but here -> felix_rxtstamp() return 0; When in felix_rxtstamp(), this driver accounted for the fact that eth_type_trans() happened in the meanwhile, so it got a hold of the extraction header again by subtracting (ETH_HLEN + OCELOT_TAG_LEN) bytes from the current skb->data. This worked for quite some time but was quite fragile from the very beginning. Not to mention that having DSA tag parsing split in two different files, under different folders (net/dsa/tag_ocelot.c vs drivers/net/dsa/ocelot/felix.c) made it quite non-obvious for patches to come that they might break this. Finally, the blamed commit does the following: at the end of ocelot_rcv(), it checks whether the skb payload contains a VLAN header. If it does, and this port is under a VLAN-aware bridge, that VLAN ID might not be correct in the sense that the packet might have suffered VLAN rewriting due to TCAM rules (VCAP IS1). So we consume the VLAN ID from the skb payload using __skb_vlan_pop(), and take the classified VLAN ID from the DSA tag, and construct a hwaccel VLAN tag with the classified VLAN, and the skb payload is VLAN-untagged. The big problem is that __skb_vlan_pop() does: memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN); __skb_pull(skb, VLAN_HLEN); aka it moves the Ethernet header 4 bytes to the right, and pulls 4 bytes from the skb headroom (effectively also moving skb->data, by definition). So for felix_rxtstamp()'s fragile logic, all bets are off now. Instead of having the "extraction" pointer point to the DSA header, it actually points to 4 bytes _inside_ the extraction header. Corollary, the last 4 bytes of the "extraction" header are in fact 4 stale bytes of the destination MAC address from the Ethernet header, from prior to the __skb_vlan_pop() movement. So of course, RX timestamps are completely bogus when the system is configured in this way. The fix is actually very simple: just don't structure the code like that. For better or worse, the DSA PTP timestamping API does not offer a straightforward way for drivers to present their RX timestamps, but other drivers (sja1105) have established a simple mechanism to carry their RX timestamp from dsa_device_ops :: rcv() all the way to dsa_switch_ops :: port_rxtstamp() and even later. That mechanism is to simply save the partial timestamp to the skb->cb, and complete it later. Question: why don't we simply populate the skb's struct skb_shared_hwtstamps from ocelot_rcv(), and bother with this complication of propagating the timestamp to felix_rxtstamp()? Answer: dsa_switch_ops :: port_rxtstamp() answers the question whether PTP packets need sleepable context to retrieve the full RX timestamp. Currently felix_rxtstamp() answers "no, thanks" to that question, and calls ocelot_ptp_gettime64() from softirq atomic context. This is understandable, since Felix VSC9959 is a PCIe memory-mapped switch, so hardware access does not require sleeping. But the felix driver is preparing for the introduction of other switches where hardware access is over a slow bus like SPI or MDIO: https://lore.kernel.org/lkml/20210814025003.2449143-1-colin.foster@in-advantage.com/ So I would like to keep this code structure, so the rework needed when that driver will need PTP support will be minimal (answer "yes, I need deferred context for this skb's RX timestamp", then the partial timestamp will still be found in the skb->cb. Fixes: ea440cd2d9b2 ("net: dsa: tag_ocelot: use VLAN information from tagging header when available") Reported-by: Po Liu <po.liu@nxp.com> Cc: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-03 03:31:22 +08:00
u32 tstamp_lo = OCELOT_SKB_CB(skb)->tstamp_lo;
struct skb_shared_hwtstamps *shhwtstamps;
struct ocelot *ocelot = ds->priv;
struct timespec64 ts;
net: dsa: felix: fix broken VLAN-tagged PTP under VLAN-aware bridge Normally it is expected that the dsa_device_ops :: rcv() method finishes parsing the DSA tag and consumes it, then never looks at it again. But commit c0bcf537667c ("net: dsa: ocelot: add hardware timestamping support for Felix") added support for RX timestamping in a very unconventional way. On this switch, a partial timestamp is available in the DSA header, but the driver got away with not parsing that timestamp right away, but instead delayed that parsing for a little longer: dsa_switch_rcv(): nskb = cpu_dp->rcv(skb, dev); <------------- not here -> ocelot_rcv() ... skb = nskb; skb_push(skb, ETH_HLEN); skb->pkt_type = PACKET_HOST; skb->protocol = eth_type_trans(skb, skb->dev); ... if (dsa_skb_defer_rx_timestamp(p, skb)) <--- but here -> felix_rxtstamp() return 0; When in felix_rxtstamp(), this driver accounted for the fact that eth_type_trans() happened in the meanwhile, so it got a hold of the extraction header again by subtracting (ETH_HLEN + OCELOT_TAG_LEN) bytes from the current skb->data. This worked for quite some time but was quite fragile from the very beginning. Not to mention that having DSA tag parsing split in two different files, under different folders (net/dsa/tag_ocelot.c vs drivers/net/dsa/ocelot/felix.c) made it quite non-obvious for patches to come that they might break this. Finally, the blamed commit does the following: at the end of ocelot_rcv(), it checks whether the skb payload contains a VLAN header. If it does, and this port is under a VLAN-aware bridge, that VLAN ID might not be correct in the sense that the packet might have suffered VLAN rewriting due to TCAM rules (VCAP IS1). So we consume the VLAN ID from the skb payload using __skb_vlan_pop(), and take the classified VLAN ID from the DSA tag, and construct a hwaccel VLAN tag with the classified VLAN, and the skb payload is VLAN-untagged. The big problem is that __skb_vlan_pop() does: memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN); __skb_pull(skb, VLAN_HLEN); aka it moves the Ethernet header 4 bytes to the right, and pulls 4 bytes from the skb headroom (effectively also moving skb->data, by definition). So for felix_rxtstamp()'s fragile logic, all bets are off now. Instead of having the "extraction" pointer point to the DSA header, it actually points to 4 bytes _inside_ the extraction header. Corollary, the last 4 bytes of the "extraction" header are in fact 4 stale bytes of the destination MAC address from the Ethernet header, from prior to the __skb_vlan_pop() movement. So of course, RX timestamps are completely bogus when the system is configured in this way. The fix is actually very simple: just don't structure the code like that. For better or worse, the DSA PTP timestamping API does not offer a straightforward way for drivers to present their RX timestamps, but other drivers (sja1105) have established a simple mechanism to carry their RX timestamp from dsa_device_ops :: rcv() all the way to dsa_switch_ops :: port_rxtstamp() and even later. That mechanism is to simply save the partial timestamp to the skb->cb, and complete it later. Question: why don't we simply populate the skb's struct skb_shared_hwtstamps from ocelot_rcv(), and bother with this complication of propagating the timestamp to felix_rxtstamp()? Answer: dsa_switch_ops :: port_rxtstamp() answers the question whether PTP packets need sleepable context to retrieve the full RX timestamp. Currently felix_rxtstamp() answers "no, thanks" to that question, and calls ocelot_ptp_gettime64() from softirq atomic context. This is understandable, since Felix VSC9959 is a PCIe memory-mapped switch, so hardware access does not require sleeping. But the felix driver is preparing for the introduction of other switches where hardware access is over a slow bus like SPI or MDIO: https://lore.kernel.org/lkml/20210814025003.2449143-1-colin.foster@in-advantage.com/ So I would like to keep this code structure, so the rework needed when that driver will need PTP support will be minimal (answer "yes, I need deferred context for this skb's RX timestamp", then the partial timestamp will still be found in the skb->cb. Fixes: ea440cd2d9b2 ("net: dsa: tag_ocelot: use VLAN information from tagging header when available") Reported-by: Po Liu <po.liu@nxp.com> Cc: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-03 03:31:22 +08:00
u32 tstamp_hi;
u64 tstamp;
net: dsa: tag_ocelot_8021q: add support for PTP timestamping For TX timestamping, we use the felix_txtstamp method which is common with the regular (non-8021q) ocelot tagger. This method says that skb deferral is needed, prepares a timestamp request ID, and puts a clone of the skb in a queue waiting for the timestamp IRQ. felix_txtstamp is called by dsa_skb_tx_timestamp() just before the tagger's xmit method. In the tagger xmit, we divert the packets classified by dsa_skb_tx_timestamp() as PTP towards the MMIO-based injection registers, and we declare them as dead towards dsa_slave_xmit. If not PTP, we proceed with normal tag_8021q stuff. Then the timestamp IRQ fires, the clone queued up from felix_txtstamp is matched to the TX timestamp retrieved from the switch's FIFO based on the timestamp request ID, and the clone is delivered to the stack. On RX, thanks to the VCAP IS2 rule that redirects the frames with an EtherType for 1588 towards two destinations: - the CPU port module (for MMIO based extraction) and - if the "no XTR IRQ" workaround is in place, the dsa_8021q CPU port the relevant data path processing starts in the ptp_classify_raw BPF classifier installed by DSA in the RX data path (post tagger, which is completely unaware that it saw a PTP packet). This time we can't reuse the same implementation of .port_rxtstamp that also works with the default ocelot tagger. That is because felix_rxtstamp is given an skb with a freshly stripped DSA header, and it says "I don't need deferral for its RX timestamp, it's right in it, let me show you"; and it just points to the header right behind skb->data, from where it unpacks the timestamp and annotates the skb with it. The same thing cannot happen with tag_ocelot_8021q, because for one thing, the skb did not have an extraction frame header in the first place, but a VLAN tag with no timestamp information. So the code paths in felix_rxtstamp for the regular and 8021q tagger are completely independent. With tag_8021q, the timestamp must come from the packet's duplicate delivered to the CPU port module, but there is potentially complex logic to be handled [ and prone to reordering ] if we were to just start reading packets from the CPU port module, and try to match them to the one we received over Ethernet and which needs an RX timestamp. So we do something simple: we tell DSA "give me some time to think" (we request skb deferral by returning false from .port_rxtstamp) and we just drop the frame we got over Ethernet with no attempt to match it to anything - we just treat it as a notification that there's data to be processed from the CPU port module's queues. Then we proceed to read the packets from those, one by one, which we deliver up the stack, timestamped, using netif_rx - the same function that any driver would use anyway if it needed RX timestamp deferral. So the assumption is that we'll come across the PTP packet that triggered the CPU extraction notification eventually, but we don't know when exactly. Thanks to the VCAP IS2 trap/redirect rule and the exclusion of the CPU port module from the flooding replicators, only PTP frames should be present in the CPU port module's RX queues anyway. There is just one conflict between the VCAP IS2 trapping rule and the semantics of the BPF classifier. Namely, ptp_classify_raw() deems general messages as non-timestampable, but still, those are trapped to the CPU port module since they have an EtherType of ETH_P_1588. So, if the "no XTR IRQ" workaround is in place, we need to run another BPF classifier on the frames extracted over MMIO, to avoid duplicates being sent to the stack (once over Ethernet, once over MMIO). It doesn't look like it's possible to install VCAP IS2 rules based on keys extracted from the 1588 frame headers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 06:38:01 +08:00
/* If the "no XTR IRQ" workaround is in use, tell DSA to defer this skb
* for RX timestamping. Then free it, and poll for its copy through
* MMIO in the CPU port module, and inject that into the stack from
* ocelot_xtr_poll().
*/
if (felix_check_xtr_pkt(ocelot)) {
net: dsa: tag_ocelot_8021q: add support for PTP timestamping For TX timestamping, we use the felix_txtstamp method which is common with the regular (non-8021q) ocelot tagger. This method says that skb deferral is needed, prepares a timestamp request ID, and puts a clone of the skb in a queue waiting for the timestamp IRQ. felix_txtstamp is called by dsa_skb_tx_timestamp() just before the tagger's xmit method. In the tagger xmit, we divert the packets classified by dsa_skb_tx_timestamp() as PTP towards the MMIO-based injection registers, and we declare them as dead towards dsa_slave_xmit. If not PTP, we proceed with normal tag_8021q stuff. Then the timestamp IRQ fires, the clone queued up from felix_txtstamp is matched to the TX timestamp retrieved from the switch's FIFO based on the timestamp request ID, and the clone is delivered to the stack. On RX, thanks to the VCAP IS2 rule that redirects the frames with an EtherType for 1588 towards two destinations: - the CPU port module (for MMIO based extraction) and - if the "no XTR IRQ" workaround is in place, the dsa_8021q CPU port the relevant data path processing starts in the ptp_classify_raw BPF classifier installed by DSA in the RX data path (post tagger, which is completely unaware that it saw a PTP packet). This time we can't reuse the same implementation of .port_rxtstamp that also works with the default ocelot tagger. That is because felix_rxtstamp is given an skb with a freshly stripped DSA header, and it says "I don't need deferral for its RX timestamp, it's right in it, let me show you"; and it just points to the header right behind skb->data, from where it unpacks the timestamp and annotates the skb with it. The same thing cannot happen with tag_ocelot_8021q, because for one thing, the skb did not have an extraction frame header in the first place, but a VLAN tag with no timestamp information. So the code paths in felix_rxtstamp for the regular and 8021q tagger are completely independent. With tag_8021q, the timestamp must come from the packet's duplicate delivered to the CPU port module, but there is potentially complex logic to be handled [ and prone to reordering ] if we were to just start reading packets from the CPU port module, and try to match them to the one we received over Ethernet and which needs an RX timestamp. So we do something simple: we tell DSA "give me some time to think" (we request skb deferral by returning false from .port_rxtstamp) and we just drop the frame we got over Ethernet with no attempt to match it to anything - we just treat it as a notification that there's data to be processed from the CPU port module's queues. Then we proceed to read the packets from those, one by one, which we deliver up the stack, timestamped, using netif_rx - the same function that any driver would use anyway if it needed RX timestamp deferral. So the assumption is that we'll come across the PTP packet that triggered the CPU extraction notification eventually, but we don't know when exactly. Thanks to the VCAP IS2 trap/redirect rule and the exclusion of the CPU port module from the flooding replicators, only PTP frames should be present in the CPU port module's RX queues anyway. There is just one conflict between the VCAP IS2 trapping rule and the semantics of the BPF classifier. Namely, ptp_classify_raw() deems general messages as non-timestampable, but still, those are trapped to the CPU port module since they have an EtherType of ETH_P_1588. So, if the "no XTR IRQ" workaround is in place, we need to run another BPF classifier on the frames extracted over MMIO, to avoid duplicates being sent to the stack (once over Ethernet, once over MMIO). It doesn't look like it's possible to install VCAP IS2 rules based on keys extracted from the 1588 frame headers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14 06:38:01 +08:00
kfree_skb(skb);
return true;
}
ocelot_ptp_gettime64(&ocelot->ptp_info, &ts);
tstamp = ktime_set(ts.tv_sec, ts.tv_nsec);
tstamp_hi = tstamp >> 32;
if ((tstamp & 0xffffffff) < tstamp_lo)
tstamp_hi--;
tstamp = ((u64)tstamp_hi << 32) | tstamp_lo;
shhwtstamps = skb_hwtstamps(skb);
memset(shhwtstamps, 0, sizeof(struct skb_shared_hwtstamps));
shhwtstamps->hwtstamp = tstamp;
return false;
}
static void felix_txtstamp(struct dsa_switch *ds, int port,
struct sk_buff *skb)
{
struct ocelot *ocelot = ds->priv;
struct sk_buff *clone = NULL;
if (!ocelot->ptp)
return;
if (ocelot_port_txtstamp_request(ocelot, port, skb, &clone)) {
dev_err_ratelimited(ds->dev,
"port %d delivering skb without TX timestamp\n",
port);
return;
}
if (clone)
OCELOT_SKB_CB(skb)->clone = clone;
}
static int felix_change_mtu(struct dsa_switch *ds, int port, int new_mtu)
{
struct ocelot *ocelot = ds->priv;
ocelot_port_set_maxlen(ocelot, port, new_mtu);
return 0;
}
static int felix_get_max_mtu(struct dsa_switch *ds, int port)
{
struct ocelot *ocelot = ds->priv;
return ocelot_get_max_mtu(ocelot, port);
}
static int felix_cls_flower_add(struct dsa_switch *ds, int port,
struct flow_cls_offload *cls, bool ingress)
{
struct ocelot *ocelot = ds->priv;
net: dsa: felix: update destinations of existing traps with ocelot-8021q Historically, the felix DSA driver has installed special traps such that PTP over L2 works with the ocelot-8021q tagging protocol; commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") has the details. Then the ocelot switch library also gained more comprehensive support for PTP traps through commit 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets"). Right now, PTP over L2 works using ocelot-8021q via the traps it has set for itself, but nothing else does. Consolidating the two code blocks would make ocelot-8021q gain support for PTP over L4 and tc-flower traps, and at the same time avoid some code and TCAM duplication. The traps are similar in intent, but different in execution, so some explanation is required. The traps set up by felix_setup_mmio_filtering() are VCAP IS1 filters, which have a PAG that chains them to a VCAP IS2 filter, and the IS2 is where the 'trap' action resides. The traps set up by ocelot_trap_add(), on the other hand, have a single filter, in VCAP IS2. The reason for chaining VCAP IS1 and IS2 in Felix was to ensure that the hardcoded traps take precedence and cannot be overridden by the Ocelot switch library. So in principle, the PTP traps needed for ocelot-8021q in the Felix driver can rely on ocelot_trap_add(), but the filters need to be patched to account for a quirk that LS1028A has: the quirk_no_xtr_irq described in commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping"). Live-patching is done by iterating through the trap list every time we know it has been updated, and transforming a trap into a redirect + CPU copy if ocelot-8021q is in use. Making the DSA ocelot-8021q tagger work with the Ocelot traps means we can eliminate the dedicated OCELOT_VCAP_IS1_TAG_8021Q_PTP_MMIO and OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO cookies. To minimize the patch delta, OCELOT_VCAP_IS2_MRP_TRAP takes the place of OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO (the alternative would have been to left-shift all cookie numbers by 1). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16 22:30:13 +08:00
struct felix *felix = ocelot_to_felix(ocelot);
bool using_tag_8021q;
int err;
err = ocelot_cls_flower_replace(ocelot, port, cls, ingress);
if (err)
return err;
using_tag_8021q = felix->tag_proto == DSA_TAG_PROTO_OCELOT_8021Q;
net: dsa: felix: update destinations of existing traps with ocelot-8021q Historically, the felix DSA driver has installed special traps such that PTP over L2 works with the ocelot-8021q tagging protocol; commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") has the details. Then the ocelot switch library also gained more comprehensive support for PTP traps through commit 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets"). Right now, PTP over L2 works using ocelot-8021q via the traps it has set for itself, but nothing else does. Consolidating the two code blocks would make ocelot-8021q gain support for PTP over L4 and tc-flower traps, and at the same time avoid some code and TCAM duplication. The traps are similar in intent, but different in execution, so some explanation is required. The traps set up by felix_setup_mmio_filtering() are VCAP IS1 filters, which have a PAG that chains them to a VCAP IS2 filter, and the IS2 is where the 'trap' action resides. The traps set up by ocelot_trap_add(), on the other hand, have a single filter, in VCAP IS2. The reason for chaining VCAP IS1 and IS2 in Felix was to ensure that the hardcoded traps take precedence and cannot be overridden by the Ocelot switch library. So in principle, the PTP traps needed for ocelot-8021q in the Felix driver can rely on ocelot_trap_add(), but the filters need to be patched to account for a quirk that LS1028A has: the quirk_no_xtr_irq described in commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping"). Live-patching is done by iterating through the trap list every time we know it has been updated, and transforming a trap into a redirect + CPU copy if ocelot-8021q is in use. Making the DSA ocelot-8021q tagger work with the Ocelot traps means we can eliminate the dedicated OCELOT_VCAP_IS1_TAG_8021Q_PTP_MMIO and OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO cookies. To minimize the patch delta, OCELOT_VCAP_IS2_MRP_TRAP takes the place of OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO (the alternative would have been to left-shift all cookie numbers by 1). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16 22:30:13 +08:00
return felix_update_trapping_destinations(ds, using_tag_8021q);
}
static int felix_cls_flower_del(struct dsa_switch *ds, int port,
struct flow_cls_offload *cls, bool ingress)
{
struct ocelot *ocelot = ds->priv;
return ocelot_cls_flower_destroy(ocelot, port, cls, ingress);
}
static int felix_cls_flower_stats(struct dsa_switch *ds, int port,
struct flow_cls_offload *cls, bool ingress)
{
struct ocelot *ocelot = ds->priv;
return ocelot_cls_flower_stats(ocelot, port, cls, ingress);
}
static int felix_port_policer_add(struct dsa_switch *ds, int port,
struct dsa_mall_policer_tc_entry *policer)
{
struct ocelot *ocelot = ds->priv;
struct ocelot_policer pol = {
.rate = div_u64(policer->rate_bytes_per_sec, 1000) * 8,
.burst = policer->burst,
};
return ocelot_port_policer_add(ocelot, port, &pol);
}
static void felix_port_policer_del(struct dsa_switch *ds, int port)
{
struct ocelot *ocelot = ds->priv;
ocelot_port_policer_del(ocelot, port);
}
static int felix_port_mirror_add(struct dsa_switch *ds, int port,
struct dsa_mall_mirror_tc_entry *mirror,
bool ingress, struct netlink_ext_ack *extack)
{
struct ocelot *ocelot = ds->priv;
return ocelot_port_mirror_add(ocelot, port, mirror->to_local_port,
ingress, extack);
}
static void felix_port_mirror_del(struct dsa_switch *ds, int port,
struct dsa_mall_mirror_tc_entry *mirror)
{
struct ocelot *ocelot = ds->priv;
ocelot_port_mirror_del(ocelot, port, mirror->ingress);
}
static int felix_port_setup_tc(struct dsa_switch *ds, int port,
enum tc_setup_type type,
void *type_data)
{
struct ocelot *ocelot = ds->priv;
struct felix *felix = ocelot_to_felix(ocelot);
if (felix->info->port_setup_tc)
return felix->info->port_setup_tc(ds, port, type, type_data);
else
return -EOPNOTSUPP;
}
net: mscc: ocelot: configure watermarks using devlink-sb Using devlink-sb, we can configure 12/16 (the important 75%) of the switch's controlling watermarks for congestion drops, and we can monitor 50% of the watermark occupancies (we can monitor the reservation watermarks, but not the sharing watermarks, which are exposed as pool sizes). The following definitions can be made: SB_BUF=0 # The devlink-sb for frame buffers SB_REF=1 # The devlink-sb for frame references POOL_ING=0 # The pool for ingress traffic. Both devlink-sb instances # have one of these. POOL_EGR=1 # The pool for egress traffic. Both devlink-sb instances # have one of these. Editing the hardware watermarks is done in the following way: BUF_xxxx_I is accessed when sb=$SB_BUF and pool=$POOL_ING REF_xxxx_I is accessed when sb=$SB_REF and pool=$POOL_ING BUF_xxxx_E is accessed when sb=$SB_BUF and pool=$POOL_EGR REF_xxxx_E is accessed when sb=$SB_REF and pool=$POOL_EGR Configuring the sharing watermarks for COL_SHR(dp=0) is done implicitly by modifying the corresponding pool size. By default, the pool size has maximum size, so this can be skipped. devlink sb pool set pci/0000:00:00.5 sb $SB_BUF pool $POOL_ING \ size 129840 thtype static Since by default there is no buffer reservation, the above command has maxed out BUF_COL_SHR_I(dp=0). Configuring the per-port reservation watermark (P_RSRV) is done in the following way: devlink sb port pool set pci/0000:00:00.5/0 sb $SB_BUF \ pool $POOL_ING th 1000 The above command sets BUF_P_RSRV_I(port 0) to 1000 bytes. After this command, the sharing watermarks are internally reconfigured with 1000 bytes less, i.e. from 129840 bytes to 128840 bytes. Configuring the per-port-tc reservation watermarks (Q_RSRV) is done in the following way: for tc in {0..7}; do devlink sb tc bind set pci/0000:00:00.5/0 sb 0 tc $tc \ type ingress pool $POOL_ING \ th 3000 done The above command sets BUF_Q_RSRV_I(port 0, tc 0..7) to 3000 bytes. The sharing watermarks are again reconfigured with 24000 bytes less. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 10:11:20 +08:00
static int felix_sb_pool_get(struct dsa_switch *ds, unsigned int sb_index,
u16 pool_index,
struct devlink_sb_pool_info *pool_info)
{
struct ocelot *ocelot = ds->priv;
return ocelot_sb_pool_get(ocelot, sb_index, pool_index, pool_info);
}
static int felix_sb_pool_set(struct dsa_switch *ds, unsigned int sb_index,
u16 pool_index, u32 size,
enum devlink_sb_threshold_type threshold_type,
struct netlink_ext_ack *extack)
{
struct ocelot *ocelot = ds->priv;
return ocelot_sb_pool_set(ocelot, sb_index, pool_index, size,
threshold_type, extack);
}
static int felix_sb_port_pool_get(struct dsa_switch *ds, int port,
unsigned int sb_index, u16 pool_index,
u32 *p_threshold)
{
struct ocelot *ocelot = ds->priv;
return ocelot_sb_port_pool_get(ocelot, port, sb_index, pool_index,
p_threshold);
}
static int felix_sb_port_pool_set(struct dsa_switch *ds, int port,
unsigned int sb_index, u16 pool_index,
u32 threshold, struct netlink_ext_ack *extack)
{
struct ocelot *ocelot = ds->priv;
return ocelot_sb_port_pool_set(ocelot, port, sb_index, pool_index,
threshold, extack);
}
static int felix_sb_tc_pool_bind_get(struct dsa_switch *ds, int port,
unsigned int sb_index, u16 tc_index,
enum devlink_sb_pool_type pool_type,
u16 *p_pool_index, u32 *p_threshold)
{
struct ocelot *ocelot = ds->priv;
return ocelot_sb_tc_pool_bind_get(ocelot, port, sb_index, tc_index,
pool_type, p_pool_index,
p_threshold);
}
static int felix_sb_tc_pool_bind_set(struct dsa_switch *ds, int port,
unsigned int sb_index, u16 tc_index,
enum devlink_sb_pool_type pool_type,
u16 pool_index, u32 threshold,
struct netlink_ext_ack *extack)
{
struct ocelot *ocelot = ds->priv;
return ocelot_sb_tc_pool_bind_set(ocelot, port, sb_index, tc_index,
pool_type, pool_index, threshold,
extack);
}
static int felix_sb_occ_snapshot(struct dsa_switch *ds,
unsigned int sb_index)
{
struct ocelot *ocelot = ds->priv;
return ocelot_sb_occ_snapshot(ocelot, sb_index);
}
static int felix_sb_occ_max_clear(struct dsa_switch *ds,
unsigned int sb_index)
{
struct ocelot *ocelot = ds->priv;
return ocelot_sb_occ_max_clear(ocelot, sb_index);
}
static int felix_sb_occ_port_pool_get(struct dsa_switch *ds, int port,
unsigned int sb_index, u16 pool_index,
u32 *p_cur, u32 *p_max)
{
struct ocelot *ocelot = ds->priv;
return ocelot_sb_occ_port_pool_get(ocelot, port, sb_index, pool_index,
p_cur, p_max);
}
static int felix_sb_occ_tc_port_bind_get(struct dsa_switch *ds, int port,
unsigned int sb_index, u16 tc_index,
enum devlink_sb_pool_type pool_type,
u32 *p_cur, u32 *p_max)
{
struct ocelot *ocelot = ds->priv;
return ocelot_sb_occ_tc_port_bind_get(ocelot, port, sb_index, tc_index,
pool_type, p_cur, p_max);
}
static int felix_mrp_add(struct dsa_switch *ds, int port,
const struct switchdev_obj_mrp *mrp)
{
struct ocelot *ocelot = ds->priv;
return ocelot_mrp_add(ocelot, port, mrp);
}
static int felix_mrp_del(struct dsa_switch *ds, int port,
const struct switchdev_obj_mrp *mrp)
{
struct ocelot *ocelot = ds->priv;
return ocelot_mrp_add(ocelot, port, mrp);
}
static int
felix_mrp_add_ring_role(struct dsa_switch *ds, int port,
const struct switchdev_obj_ring_role_mrp *mrp)
{
struct ocelot *ocelot = ds->priv;
return ocelot_mrp_add_ring_role(ocelot, port, mrp);
}
static int
felix_mrp_del_ring_role(struct dsa_switch *ds, int port,
const struct switchdev_obj_ring_role_mrp *mrp)
{
struct ocelot *ocelot = ds->priv;
return ocelot_mrp_del_ring_role(ocelot, port, mrp);
}
net: dsa: felix: configure default-prio and dscp priorities Follow the established programming model for this driver and provide shims in the felix DSA driver which call the implementations from the ocelot switch lib. The ocelot switchdev driver wasn't integrated with dcbnl due to lack of hardware availability. The switch doesn't have any fancy QoS classification enabled by default. The provided getters will create a default-prio app table entry of 0, and no dscp entry. However, the getters have been made to actually retrieve the hardware configuration rather than static values, to be future proof in case DSA will need this information from more call paths. For default-prio, there is a single field per port, in ANA_PORT_QOS_CFG, called QOS_DEFAULT_VAL. DSCP classification is enabled per-port, again via ANA_PORT_QOS_CFG (field QOS_DSCP_ENA), and individual DSCP values are configured as trusted or not through register ANA_DSCP_CFG (replicated 64 times). An untrusted DSCP value falls back to other QoS classification methods. If trusted, the selected ANA_DSCP_CFG register also holds the QoS class in the QOS_DSCP_VAL field. The hardware also supports DSCP remapping (DSCP value X is translated to DSCP value Y before the QoS class is determined based on the app table entry for Y) and DSCP packet rewriting. The dcbnl framework, for being so flexible in other useless areas, doesn't appear to support this. So this functionality has been left out. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-12 05:15:20 +08:00
static int felix_port_get_default_prio(struct dsa_switch *ds, int port)
{
struct ocelot *ocelot = ds->priv;
return ocelot_port_get_default_prio(ocelot, port);
}
static int felix_port_set_default_prio(struct dsa_switch *ds, int port,
u8 prio)
{
struct ocelot *ocelot = ds->priv;
return ocelot_port_set_default_prio(ocelot, port, prio);
}
static int felix_port_get_dscp_prio(struct dsa_switch *ds, int port, u8 dscp)
{
struct ocelot *ocelot = ds->priv;
return ocelot_port_get_dscp_prio(ocelot, port, dscp);
}
static int felix_port_add_dscp_prio(struct dsa_switch *ds, int port, u8 dscp,
u8 prio)
{
struct ocelot *ocelot = ds->priv;
return ocelot_port_add_dscp_prio(ocelot, port, dscp, prio);
}
static int felix_port_del_dscp_prio(struct dsa_switch *ds, int port, u8 dscp,
u8 prio)
{
struct ocelot *ocelot = ds->priv;
return ocelot_port_del_dscp_prio(ocelot, port, dscp, prio);
}
const struct dsa_switch_ops felix_switch_ops = {
.get_tag_protocol = felix_get_tag_protocol,
.change_tag_protocol = felix_change_tag_protocol,
.connect_tag_protocol = felix_connect_tag_protocol,
.setup = felix_setup,
.teardown = felix_teardown,
.set_ageing_time = felix_set_ageing_time,
.get_strings = felix_get_strings,
.get_ethtool_stats = felix_get_ethtool_stats,
.get_sset_count = felix_get_sset_count,
.get_ts_info = felix_get_ts_info,
.phylink_get_caps = felix_phylink_get_caps,
.phylink_validate = felix_phylink_validate,
.phylink_mac_select_pcs = felix_phylink_mac_select_pcs,
.phylink_mac_link_down = felix_phylink_mac_link_down,
.phylink_mac_link_up = felix_phylink_mac_link_up,
.port_fast_age = felix_port_fast_age,
.port_fdb_dump = felix_fdb_dump,
.port_fdb_add = felix_fdb_add,
.port_fdb_del = felix_fdb_del,
.lag_fdb_add = felix_lag_fdb_add,
.lag_fdb_del = felix_lag_fdb_del,
.port_mdb_add = felix_mdb_add,
.port_mdb_del = felix_mdb_del,
.port_pre_bridge_flags = felix_pre_bridge_flags,
.port_bridge_flags = felix_bridge_flags,
.port_bridge_join = felix_bridge_join,
.port_bridge_leave = felix_bridge_leave,
.port_lag_join = felix_lag_join,
.port_lag_leave = felix_lag_leave,
.port_lag_change = felix_lag_change,
.port_stp_state_set = felix_bridge_stp_state_set,
.port_vlan_filtering = felix_vlan_filtering,
.port_vlan_add = felix_vlan_add,
.port_vlan_del = felix_vlan_del,
.port_hwtstamp_get = felix_hwtstamp_get,
.port_hwtstamp_set = felix_hwtstamp_set,
.port_rxtstamp = felix_rxtstamp,
.port_txtstamp = felix_txtstamp,
.port_change_mtu = felix_change_mtu,
.port_max_mtu = felix_get_max_mtu,
.port_policer_add = felix_port_policer_add,
.port_policer_del = felix_port_policer_del,
.port_mirror_add = felix_port_mirror_add,
.port_mirror_del = felix_port_mirror_del,
.cls_flower_add = felix_cls_flower_add,
.cls_flower_del = felix_cls_flower_del,
.cls_flower_stats = felix_cls_flower_stats,
.port_setup_tc = felix_port_setup_tc,
net: mscc: ocelot: configure watermarks using devlink-sb Using devlink-sb, we can configure 12/16 (the important 75%) of the switch's controlling watermarks for congestion drops, and we can monitor 50% of the watermark occupancies (we can monitor the reservation watermarks, but not the sharing watermarks, which are exposed as pool sizes). The following definitions can be made: SB_BUF=0 # The devlink-sb for frame buffers SB_REF=1 # The devlink-sb for frame references POOL_ING=0 # The pool for ingress traffic. Both devlink-sb instances # have one of these. POOL_EGR=1 # The pool for egress traffic. Both devlink-sb instances # have one of these. Editing the hardware watermarks is done in the following way: BUF_xxxx_I is accessed when sb=$SB_BUF and pool=$POOL_ING REF_xxxx_I is accessed when sb=$SB_REF and pool=$POOL_ING BUF_xxxx_E is accessed when sb=$SB_BUF and pool=$POOL_EGR REF_xxxx_E is accessed when sb=$SB_REF and pool=$POOL_EGR Configuring the sharing watermarks for COL_SHR(dp=0) is done implicitly by modifying the corresponding pool size. By default, the pool size has maximum size, so this can be skipped. devlink sb pool set pci/0000:00:00.5 sb $SB_BUF pool $POOL_ING \ size 129840 thtype static Since by default there is no buffer reservation, the above command has maxed out BUF_COL_SHR_I(dp=0). Configuring the per-port reservation watermark (P_RSRV) is done in the following way: devlink sb port pool set pci/0000:00:00.5/0 sb $SB_BUF \ pool $POOL_ING th 1000 The above command sets BUF_P_RSRV_I(port 0) to 1000 bytes. After this command, the sharing watermarks are internally reconfigured with 1000 bytes less, i.e. from 129840 bytes to 128840 bytes. Configuring the per-port-tc reservation watermarks (Q_RSRV) is done in the following way: for tc in {0..7}; do devlink sb tc bind set pci/0000:00:00.5/0 sb 0 tc $tc \ type ingress pool $POOL_ING \ th 3000 done The above command sets BUF_Q_RSRV_I(port 0, tc 0..7) to 3000 bytes. The sharing watermarks are again reconfigured with 24000 bytes less. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 10:11:20 +08:00
.devlink_sb_pool_get = felix_sb_pool_get,
.devlink_sb_pool_set = felix_sb_pool_set,
.devlink_sb_port_pool_get = felix_sb_port_pool_get,
.devlink_sb_port_pool_set = felix_sb_port_pool_set,
.devlink_sb_tc_pool_bind_get = felix_sb_tc_pool_bind_get,
.devlink_sb_tc_pool_bind_set = felix_sb_tc_pool_bind_set,
.devlink_sb_occ_snapshot = felix_sb_occ_snapshot,
.devlink_sb_occ_max_clear = felix_sb_occ_max_clear,
.devlink_sb_occ_port_pool_get = felix_sb_occ_port_pool_get,
.devlink_sb_occ_tc_port_bind_get= felix_sb_occ_tc_port_bind_get,
.port_mrp_add = felix_mrp_add,
.port_mrp_del = felix_mrp_del,
.port_mrp_add_ring_role = felix_mrp_add_ring_role,
.port_mrp_del_ring_role = felix_mrp_del_ring_role,
.tag_8021q_vlan_add = felix_tag_8021q_vlan_add,
.tag_8021q_vlan_del = felix_tag_8021q_vlan_del,
net: dsa: felix: configure default-prio and dscp priorities Follow the established programming model for this driver and provide shims in the felix DSA driver which call the implementations from the ocelot switch lib. The ocelot switchdev driver wasn't integrated with dcbnl due to lack of hardware availability. The switch doesn't have any fancy QoS classification enabled by default. The provided getters will create a default-prio app table entry of 0, and no dscp entry. However, the getters have been made to actually retrieve the hardware configuration rather than static values, to be future proof in case DSA will need this information from more call paths. For default-prio, there is a single field per port, in ANA_PORT_QOS_CFG, called QOS_DEFAULT_VAL. DSCP classification is enabled per-port, again via ANA_PORT_QOS_CFG (field QOS_DSCP_ENA), and individual DSCP values are configured as trusted or not through register ANA_DSCP_CFG (replicated 64 times). An untrusted DSCP value falls back to other QoS classification methods. If trusted, the selected ANA_DSCP_CFG register also holds the QoS class in the QOS_DSCP_VAL field. The hardware also supports DSCP remapping (DSCP value X is translated to DSCP value Y before the QoS class is determined based on the app table entry for Y) and DSCP packet rewriting. The dcbnl framework, for being so flexible in other useless areas, doesn't appear to support this. So this functionality has been left out. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-12 05:15:20 +08:00
.port_get_default_prio = felix_port_get_default_prio,
.port_set_default_prio = felix_port_set_default_prio,
.port_get_dscp_prio = felix_port_get_dscp_prio,
.port_add_dscp_prio = felix_port_add_dscp_prio,
.port_del_dscp_prio = felix_port_del_dscp_prio,
net: dsa: felix: manage host flooding using a specific driver callback At the time - commit 7569459a52c9 ("net: dsa: manage flooding on the CPU ports") - not introducing a dedicated switch callback for host flooding made sense, because for the only user, the felix driver, there was nothing different to do for the CPU port than set the flood flags on the CPU port just like on any other bridge port. There are 2 reasons why this approach is not good enough, however. (1) Other drivers, like sja1105, support configuring flooding as a function of {ingress port, egress port}, whereas the DSA ->port_bridge_flags() function only operates on an egress port. So with that driver we'd have useless host flooding from user ports which don't need it. (2) Even with the felix driver, support for multiple CPU ports makes it difficult to piggyback on ->port_bridge_flags(). The way in which the felix driver is going to support host-filtered addresses with multiple CPU ports is that it will direct these addresses towards both CPU ports (in a sort of multicast fashion), then restrict the forwarding to only one of the two using the forwarding masks. Consequently, flooding will also be enabled towards both CPU ports. However, ->port_bridge_flags() gets passed the index of a single CPU port, and that leaves the flood settings out of sync between the 2 CPU ports. This is to say, it's better to have a specific driver method for host flooding, which takes the user port as argument. This solves problem (1) by allowing the driver to do different things for different user ports, and problem (2) by abstracting the operation and letting the driver do whatever, rather than explicitly making the DSA core point to the CPU port it thinks needs to be touched. This new method also creates a problem, which is that cross-chip setups are not handled. However I don't have hardware right now where I can test what is the proper thing to do, and there isn't hardware compatible with multi-switch trees that supports host flooding. So it remains a problem to be tackled in the future. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11 17:50:17 +08:00
.port_set_host_flood = felix_port_set_host_flood,
net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 23:03:30 +08:00
};
struct net_device *felix_port_to_netdev(struct ocelot *ocelot, int port)
{
struct felix *felix = ocelot_to_felix(ocelot);
struct dsa_switch *ds = felix->ds;
if (!dsa_is_user_port(ds, port))
return NULL;
return dsa_to_port(ds, port)->slave;
}
int felix_netdev_to_port(struct net_device *dev)
{
struct dsa_port *dp;
dp = dsa_port_from_netdev(dev);
if (IS_ERR(dp))
return -EINVAL;
return dp->index;
}