OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Dimitris Michailidis	e1ffcc6681	net/fungible: Add service module for Fungible drivers Fungible cards have a number of different PCI functions and thus different drivers, all of which use a common method to initialize and interact with the device. This commit adds a library module that collects these common mechanisms. They mainly deal with device initialization, setting up and destroying queues, and operating an admin queue. A subset of the FW interface is also included here. Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-27 10:51:23 +00:00
Dimitris Michailidis	e8eb9e3299	PCI: Add Fungible Vendor ID to pci_ids.h Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-27 10:51:23 +00:00
David S. Miller	4aaa489538	Merge branch 'ip-neigh-skb-reason' Menglong Dong says: ==================== net: use kfree_skb_reason() for ip/neighbour In the series "net: use kfree_skb_reason() for ip/udp packet receive", reasons for skb drops are added to the packet receive process of IP layer. Link: https://lore.kernel.org/netdev/20220205074739.543606-1-imagedong@tencent.com/ And in the first patch of this series, skb drop reasons are added to the packet egress path of IP layer. As kfree_skb() is not used frequent, I commit these changes at once and didn't create a patch for every functions that involed. Following functions are handled: __ip_queue_xmit() ip_finish_output() ip_mc_finish_output() ip6_output() ip6_finish_output() ip6_finish_output2() Following new drop reasons are introduced (what they mean can be seen in the document of them): SKB_DROP_REASON_IP_OUTNOROUTES SKB_DROP_REASON_BPF_CGROUP_EGRESS SKB_DROP_REASON_IPV6DISABLED SKB_DROP_REASON_NEIGH_CREATEFAIL In the 2th and 3th patches, kfree_skb_reason() is used in neighbour subsystem instead of kfree_skb(). __neigh_event_send() and arp_error_report() are involed, and following new drop reasons are introduced: SKB_DROP_REASON_NEIGH_FAILED SKB_DROP_REASON_NEIGH_QUEUEFULL SKB_DROP_REASON_NEIGH_DEAD Changes since v2: - fix typo in the 1th patch of 'SKB_DROP_REASON_IPV6DSIABLED' reported by Roman Changes since v1: - introduce SKB_DROP_REASON_NEIGH_CREATEFAIL for some path in the 1th patch - introduce SKB_DROP_REASON_NEIGH_DEAD in the 2th patch - simplify the document for the new drop reasons, as David Ahern suggested ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-26 12:53:59 +00:00
Menglong Dong	56d4b4e48a	net: neigh: add skb drop reasons to arp_error_report() When neighbour become invalid or destroyed, neigh_invalidate() will be called. neigh->ops->error_report() will be called if the neighbour's state is NUD_FAILED, and seems here is the only use of error_report(). So we can tell that the reason of skb drops in arp_error_report() is SKB_DROP_REASON_NEIGH_FAILED. Replace kfree_skb() used in arp_error_report() with kfree_skb_reason(). Reviewed-by: Mengen Sun <mengensun@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-26 12:53:59 +00:00
Menglong Dong	a5736edda1	net: neigh: use kfree_skb_reason() for __neigh_event_send() Replace kfree_skb() used in __neigh_event_send() with kfree_skb_reason(). Following drop reasons are added: SKB_DROP_REASON_NEIGH_FAILED SKB_DROP_REASON_NEIGH_QUEUEFULL SKB_DROP_REASON_NEIGH_DEAD The first two reasons above should be the hot path that skb drops in neighbour subsystem. Reviewed-by: Mengen Sun <mengensun@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-26 12:53:59 +00:00
Menglong Dong	5e187189ec	net: ip: add skb drop reasons for ip egress path Replace kfree_skb() which is used in the packet egress path of IP layer with kfree_skb_reason(). Functions that are involved include: __ip_queue_xmit() ip_finish_output() ip_mc_finish_output() ip6_output() ip6_finish_output() ip6_finish_output2() Following new drop reasons are introduced: SKB_DROP_REASON_IP_OUTNOROUTES SKB_DROP_REASON_BPF_CGROUP_EGRESS SKB_DROP_REASON_IPV6DISABLED SKB_DROP_REASON_NEIGH_CREATEFAIL Reviewed-by: Mengen Sun <mengensun@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-26 12:53:58 +00:00
David S. Miller	0cc70c6eec	Merge branch 'dsa-ocelot-phylink-updates' Russell King says: ==================== net: dsa: ocelot: phylink updates This series updates the Ocelot DSA driver for some of the recent phylink changes. Specifically, we fill in the supported_interfaces fields, convert to mac_select_pcs and mark the driver as non-legacy. We do not convert to phylink_generic_validate() as Ocelot has special support for its rate adapting PCS which makes the generic validate method unsuitable for this driver. The three changes mentioned above are implemented in their own separate patches with one additional cleanup: 1) Populate the supported_interfaces bitmap 2) Remove the now unnecessary interface checks in the validate methods 3) Convert from phylink_set_pcs() to .mac_select_pcs. 4) Mark the driver as non-legacy Thanks. RFC -> non-RFC: add reviewed-by/tested-by's, update patch 1 to set the supported_interfaces bitmap in felix.c rather than the sub-drivers as requested by Vladimir. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-26 12:44:29 +00:00
Russell King (Oracle)	f6f04c0204	net: dsa: ocelot: mark as non-legacy The ocelot DSA driver does not make use of the speed, duplex, pause or advertisement in its phylink_mac_config() implementation, so it can be marked as a non-legacy driver. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-26 12:44:29 +00:00
Russell King (Oracle)	864ba485ac	net: dsa: ocelot: convert to mac_select_pcs() Convert the PCS selection to use mac_select_pcs, which allows the PCS to perform any validation it needs, and removes the need to set the PCS in the mac_config() callback, delving into the higher DSA levels to do so. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-26 12:44:28 +00:00
Russell King (Oracle)	e57a15401e	net: dsa: ocelot: remove interface checks When the supported interfaces bitmap is populated, phylink will itself check that the interface mode is present in this bitmap. Drivers no longer need to perform this check themselves. Remove these checks. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-26 12:44:28 +00:00
Russell King (Oracle)	79fda660bd	net: dsa: ocelot: populate supported_interfaces Populate the supported interfaces bitmap for the Ocelot DSA switches. Since all sub-drivers only support a single interface mode, defined by ocelot_port->phy_mode, we can handle this in the main driver code without reference to the sub-driver. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-26 12:44:28 +00:00
Jakub Kicinski	3e120e4580	Merge branch 'small-fixes-for-mctp' Matt Johnston says: ==================== Small fixes for MCTP This series has 3 fixes for MCTP. ==================== Link: https://lore.kernel.org/r/20220225053938.643605-1-matt@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-25 22:23:37 -08:00
Matt Johnston	33f5d1a9d9	mctp i2c: Fix hard head TX bounds length check We should be testing the length before fitting into the u8 byte_count. This is just a sanity check, the MCTP stack should have limited to MTU which is checked, and we check consistency later in mctp_i2c_xmit(). Found by Smatch mctp_i2c_header_create() warn: impossible condition '(hdr->byte_count > 255) => (0-255 > 255)' Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-25 22:23:33 -08:00
Matt Johnston	06bf1ce69d	mctp i2c: Fix potential use-after-free The skb is handed off to netif_rx() which may free it. Found by Smatch. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-25 22:23:29 -08:00
Matt Johnston	f62457df5c	mctp: Avoid warning if unregister notifies twice Previously if an unregister notify handler ran twice (waiting for netdev to be released) it would print a warning in mctp_unregister() every subsequent time the unregister notify occured. Instead we only need to worry about the case where a mctp_ptr is set on an unknown device type. Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-25 22:23:23 -08:00
Wong Vee Khee	23d7433011	stmmac: intel: Enable 2.5Gbps for Intel AlderLake-S Intel AlderLake-S platform is capable of running on 2.5GBps link speed. This patch enables 2.5Gbps link speed on AlderLake-S platform. Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com> Link: https://lore.kernel.org/r/20220225023325.474242-1-vee.khee.wong@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-25 22:22:09 -08:00
Colin Ian King	38455fbcc8	net: dsa: qca8k: return with -EINVAL on invalid port Currently an invalid port throws a WARN_ON warning however invalid uninitialized values in reg and cpu_port_index are being used later on. Fix this by returning -EINVAL for an invalid port value. Addresses clang-scan warnings: drivers/net/dsa/qca8k.c:1981:3: warning: 2nd function call argument is an uninitialized value [core.CallAndMessage] drivers/net/dsa/qca8k.c:1999:9: warning: 2nd function call argument is an uninitialized value [core.CallAndMessage] Fixes: `7544b3ff74` ("net: dsa: qca8k: move pcs configuration") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20220224220557.147075-1-colin.i.king@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-25 22:21:24 -08:00
David S. Miller	5ebaaa69bd	Merge branch 'sja1105-phylink-updates' Russell King says: ==================== net: dsa: sja1105: phylink updates This series updates the phylink implementation in sja1105 to use the supported_interfaces bitmap, convert to the mac_select_pcs() interface, mark as non-legacy, and get rid of the validation method. As a final step, enable switching between SGMII and 2500BASE-X as it is a feature that Vladimir desires. Specifically, the patches in this series: 1. Populates the supported_interfaces bitmap. 2. As a result of the supported_interfaces bitmap being populated, sja1105 no longer needs to check the interface mode as phylink will do this. 3. Switch away from using phylink_set_pcs(), using the mac_select_pcs() method instead. 4. Mark the driver as not-legacy 5. Fill in mac_capabilities using _exactly_ the same conditions as is currently used to decide which link modes to support, and convert to use phylink_generic_validate() 6. Add brand new support to permit switching between SGMII and 2500BASE-X modes of operation as per Vladimir's single patch that performs steps 1, 2, 5 and 6 in one go. There are some additional changes in Vladimir's single patch that I have not included: * validation of priv->phy_mode[] in sja1105_phylink_get_caps(). The driver has already validated the phy_mode for each port in sja1105_init_mii_settings(), and a failure here will prevent the driver reaching sja1105_phylink_get_caps(). * Changing the decisions on which mac_capabilities to set. Vladimir's patch always sets MAC_10FD \| MAC_100FD \| MAC_1000FD despite the current code clearly making the 1G speed conditional on the xmii_mode for the port. The change in decision making may be visible when in PHY_INTERFACE_MODE_INTERNAL mode, for which the phylink_generic_validate() will pass through all the MAC capabilities as ethtool link modes. Hence, if we have PHY_INTERFACE_MODE_INTERNAL but supports_rgmii[] or supports_sgmii[] is non-zero, currently we do not get 1G speeds. With Vladimir's additional change, we will get 1G speeds. While it is not clear whether that can happen, I feel changing the decision making should be a separate patch. * The decision for MAC_2500FD is made differently - sja1105_init_mii_settings() allows PHY_INTERFACE_MODE_2500BASEX when supports_2500basex[] is non-zero, and is not based on any other condition such as supports_sgmii[] or supports_rgmii[]. Vladimir's patch makes it additionally conditional on those supports_.gmii[] settings, which is a functional change that should be made in a separate patch - and if desired, then sja1105_init_mii_settings() should also be updated at the same time. Consequently, I believe that my previous objections to Vladimir's single patch approach are well founded and justified, even through Vladimir is the maintainer of this driver. I have no objection to the additional changes, I just don't think they should all be wrapped up into a single patch that converts the way validation is done _and_ also makes a bunch of other functional changes. RFC->non-RFC: added Vladimir's Reviewed-by's, fixed the typo in the commit message of patch 6, and removed the phrase at the end of a comment as requested. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-25 12:47:20 +00:00
Russell King (Oracle)	83dc4c2af6	net: dsa: sja1105: support switching between SGMII and 2500BASE-X Vladimir Oltean suggests that sja1105 can support switching between SGMII and 2500BASE-X modes. Augment sja1105_phylink_get_caps() to fill in both interface modes if they can be supported. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-25 12:47:20 +00:00
Russell King (Oracle)	9c318be13c	net: dsa: sja1105: convert to phylink_generic_validate() Populate the MAC capabilities for the SJA1105 DSA switch using the same decision making which sja1105_phylink_validate() uses. Remove the now obsolete sja1105_phylink_validate() implementation to allow DSA to use phylink_generic_validate() for this switch driver. As noted by Vladimir, this fixes an inconsequential bug which allowed gigabit and lower interface modes to be indicated when operating in 2500base-X mode. Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-25 12:47:19 +00:00
Russell King (Oracle)	2d1d548ec1	net: dsa: sja1105: mark as non-legacy The sja1105 DSA driver does not have a phylink_mac_config() method implementation, it is safe to mark this as a non-legacy driver. Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-25 12:47:19 +00:00
Russell King (Oracle)	827b4ef277	net: dsa: sja1105: use .mac_select_pcs() interface Convert the PCS selection to use mac_select_pcs, which allows the PCS to perform any validation it needs, and removes the need to set the PCS in the mac_config() callback, delving into the higher DSA levels to do so. Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-25 12:47:19 +00:00
Russell King (Oracle)	c2b8e1e3d8	net: dsa: sja1105: remove interface checks When the supported interfaces bitmap is populated, phylink will itself check that the interface mode is present in this bitmap. Drivers no longer need to perform this check themselves. Remove these checks. Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-25 12:47:19 +00:00
Russell King (Oracle)	a420b757ac	net: dsa: sja1105: populate supported_interfaces Populate the supported interfaces bitmap for the SJA1105 DSA switch. This switch only supports a static model of configuration, so we restrict the interface modes to the configured setting. Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Vladimir Oltean <vladimir. │ Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-25 12:47:19 +00:00
Toms Atteka	28a3f06017	net: openvswitch: IPv6: Add IPv6 extension header support This change adds a new OpenFlow field OFPXMT_OFB_IPV6_EXTHDR and packets can be filtered using ipv6_ext flag. Signed-off-by: Toms Atteka <cpp.code.lv@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-25 10:32:55 +00:00
Jakub Kicinski	a46e3d5eb7	Merge branch 'nfp-flow-independent-tc-action-hardware-offload' Simon Horman says: ==================== nfp: flow-independent tc action hardware offload Baowen Zheng says: Allow nfp NIC to offload tc actions independent of flows. The motivation for this work is to offload tc actions independent of flows for nfp NIC. We allow nfp driver to provide hardware offload of OVS metering feature - which calls for policers that may be used by multiple flows and whose lifecycle is independent of any flows that use them. When nfp driver tries to offload a flow table using the independent action, the driver will search if the action is already offloaded to the hardware. If not, the flow table offload will fail. When the nfp NIC successes to offload an action, the user can check in_hw_count when dumping the tc action. Tc cli command to offload and dump an action: # tc actions add action police rate 100mbit burst 10000k index 200 skip_sw # tc -s -d actions list action police total acts 1 action order 0: police 0xc8 rate 100Mbit burst 10000Kb mtu 2Kb action reclassify overhead 0b linklayer ethernet ref 1 bind 0 installed 142 sec used 0 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 skip_sw in_hw in_hw_count 1 used_hw_stats delayed ==================== Link: https://lore.kernel.org/r/20220223162302.97609-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:51:11 -08:00
Baowen Zheng	5e98743cfa	nfp: add NFP_FL_FEATS_QOS_METER to host features to enable meter offload Add NFP_FL_FEATS_QOS_METER to host features to enable meter offload in driver. Before adding this feature, we will not offload any police action since we will check the host features before offloading any police action. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:51:08 -08:00
Baowen Zheng	147747ec66	nfp: add support to offload police action from flower table Offload flow table if the action is already offloaded to hardware when flow table uses this action. Change meter id to type of u32 to support all the action index. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:51:08 -08:00
Baowen Zheng	776178a5cc	nfp: add process to get action stats from hardware Add a process to update action stats from hardware. This stats data will be updated to tc action when dumping actions or filters. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:51:07 -08:00
Baowen Zheng	26ff98d7dd	nfp: add hash table to store meter table Add a hash table to store meter table. This meter table will also be used by flower action. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:51:07 -08:00
Baowen Zheng	59080da090	nfp: add support to offload tc action to hardware Add process to offload tc action to hardware. Currently we only support to offload police action. Add meter capability to check if firmware supports meter offload. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:51:07 -08:00
Baowen Zheng	bbab5f9332	nfp: refactor policer config to support ingress/egress meter Add an policer API to support ingress/egress meter. Change ingress police to compatible with the new API. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:51:07 -08:00
Dmitry Safonov	7bbb765b73	net/tcp: Merge TCP-MD5 inbound callbacks The functions do essentially the same work to verify TCP-MD5 sign. Code can be merged into one family-independent function in order to reduce copy'n'paste and generated code. Later with TCP-AO option added, this will allow to create one function that's responsible for segment verification, that will have all the different checks for MD5/AO/non-signed packets, which in turn will help to see checks for all corner-cases in one function, rather than spread around different families and functions. Cc: Eric Dumazet <edumazet@google.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220223175740.452397-1-dima@arista.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:43:53 -08:00
Jakub Kicinski	53110c67e3	Merge branch 'fdb-entries-on-dsa-lag-interfaces' Vladimir Oltean says: ==================== FDB entries on DSA LAG interfaces This work permits having static and local FDB entries on LAG interfaces that are offloaded by DSA ports. New API needs to be introduced in drivers. To maintain consistency with the bridging offload code, I've taken the liberty to reorganize the data structures added by Tobias in the DSA core a little bit. Tested on NXP LS1028A (felix switch). Would appreciate feedback/testing on other platforms too. Testing procedure was the one described here: https://patchwork.kernel.org/project/netdevbpf/cover/20210205130240.4072854-1-vladimir.oltean@nxp.com/ with this script: ip link del bond0 ip link add bond0 type bond mode 802.3ad ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up ip link del br0 ip link add br0 type bridge && ip link set br0 up ip link set br0 arp off ip link set bond0 master br0 && ip link set bond0 up ip link set swp0 master br0 && ip link set swp0 up ip link set dev bond0 type bridge_slave flood off learning off bridge fdb add dev bond0 <mac address of other eno0> master static I'm noticing a problem in 'bridge fdb dump' with the 'self' entries, and I didn't solve this. On Ocelot, an entry learned on a LAG is reported as being on the first member port of it (so instead of saying 'self bond0', it says 'self swp1'). This is better than not seeing the entry at all, but when DSA queries for the FDBs on a port via ds->ops->port_fdb_dump, it never queries for FDBs on a LAG. Not clear what we should do there, we aren't in control of the ->ndo_fdb_dump of the bonding/team drivers. Alternatively, we could just consider the 'self' entries reported via ndo_fdb_dump as "better than nothing", and concentrate on the 'master' entries that are in sync with the bridge when packets are flooded to software. ==================== Link: https://lore.kernel.org/r/20220223140054.3379617-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:31:47 -08:00
Vladimir Oltean	961d8b6990	net: dsa: felix: support FDB entries on offloaded LAG interfaces This adds the logic in the Felix DSA driver and Ocelot switch library. For Ocelot switches, the DEST_IDX that is the output of the MAC table lookup is a logical port (equal to physical port, if no LAG is used, or a dynamically allocated number otherwise). The allocation we have in place for LAG IDs is different from DSA's, so we can't use that: - DSA allocates a continuous range of LAG IDs starting from 1 - Ocelot appears to require that physical ports and LAG IDs are in the same space of [0, num_phys_ports), and additionally, ports that aren't in a LAG must have physical port id == logical port id The implication is that an FDB entry towards a LAG might need to be deleted and reinstalled when the LAG ID changes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:31:44 -08:00
Vladimir Oltean	e212fa7c54	net: dsa: support FDB events on offloaded LAG interfaces This change introduces support for installing static FDB entries towards a bridge port that is a LAG of multiple DSA switch ports, as well as support for filtering towards the CPU local FDB entries emitted for LAG interfaces that are bridge ports. Conceptually, host addresses on LAG ports are identical to what we do for plain bridge ports. Whereas FDB entries _towards_ a LAG can't simply be replicated towards all member ports like we do for multicast, or VLAN. Instead we need new driver API. Hardware usually considers a LAG to be a "logical port", and sets the entire LAG as the forwarding destination. The physical egress port selection within the LAG is made by hashing policy, as usual. To represent the logical port corresponding to the LAG, we pass by value a copy of the dsa_lag structure to all switches in the tree that have at least one port in that LAG. To illustrate why a refcounted list of FDB entries is needed in struct dsa_lag, it is enough to say that: - a LAG may be a bridge port and may therefore receive FDB events even while it isn't yet offloaded by any DSA interface - DSA interfaces may be removed from a LAG while that is a bridge port; we don't want FDB entries lingering around, but we don't want to remove entries that are still in use, either For all the cases below to work, the idea is to always keep an FDB entry on a LAG with a reference count equal to the DSA member ports. So: - if a port joins a LAG, it requests the bridge to replay the FDB, and the FDB entries get created, or their refcount gets bumped by one - if a port leaves a LAG, the FDB replay deletes or decrements refcount by one - if an FDB is installed towards a LAG with ports already present, that entry is created (if it doesn't exist) and its refcount is bumped by the amount of ports already present in the LAG echo "Adding FDB entry to bond with existing ports" ip link del bond0 ip link add bond0 type bond mode 802.3ad ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up ip link del br0 ip link add br0 type bridge ip link set bond0 master br0 bridge fdb add dev bond0 00:01:02:03:04:05 master static ip link del br0 ip link del bond0 echo "Adding FDB entry to empty bond" ip link del bond0 ip link add bond0 type bond mode 802.3ad ip link del br0 ip link add br0 type bridge ip link set bond0 master br0 bridge fdb add dev bond0 00:01:02:03:04:05 master static ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up ip link del br0 ip link del bond0 echo "Adding FDB entry to empty bond, then removing ports one by one" ip link del bond0 ip link add bond0 type bond mode 802.3ad ip link del br0 ip link add br0 type bridge ip link set bond0 master br0 bridge fdb add dev bond0 00:01:02:03:04:05 master static ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up ip link set swp1 nomaster ip link set swp2 nomaster ip link del br0 ip link del bond0 Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:31:44 -08:00
Vladimir Oltean	93c798230a	net: dsa: call SWITCHDEV_FDB_OFFLOADED for the orig_dev When switchdev_handle_fdb_event_to_device() replicates a FDB event emitted for the bridge or for a LAG port and DSA offloads that, we should notify back to switchdev that the FDB entry on the original device is what was offloaded, not on the DSA slave devices that the event is replicated on. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:31:43 -08:00
Vladimir Oltean	e35f12e993	net: dsa: remove "ds" and "port" from struct dsa_switchdev_event_work By construction, the struct net_device *dev passed to dsa_slave_switchdev_event_work() via struct dsa_switchdev_event_work is always a DSA slave device. Therefore, it is redundant to pass struct dsa_switch and int port information in the deferred work structure. This can be retrieved at all times from the provided struct net_device via dsa_slave_to_port(). For the same reason, we can drop the dsa_is_user_port() check in dsa_fdb_offload_notify(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:31:43 -08:00
Vladimir Oltean	ec638740fc	net: switchdev: remove lag_mod_cb from switchdev_handle_fdb_event_to_device When the switchdev_handle_fdb_event_to_device() event replication helper was created, my original thought was that FDB events on LAG interfaces should most likely be special-cased, not just replicated towards all switchdev ports beneath that LAG. So this replication helper currently does not recurse through switchdev lower interfaces of LAG bridge ports, but rather calls the lag_mod_cb() if that was provided. No switchdev driver uses this helper for FDB events on LAG interfaces yet, so that was an assumption which was yet to be tested. It is certainly usable for that purpose, as my RFC series shows: https://patchwork.kernel.org/project/netdevbpf/cover/20220210125201.2859463-1-vladimir.oltean@nxp.com/ however this approach is slightly convoluted because: - the switchdev driver gets a "dev" that isn't its own net device, but rather the LAG net device. It must call switchdev_lower_dev_find(dev) in order to get a handle of any of its own net devices (the ones that pass check_cb). - in order for FDB entries on LAG ports to be correctly refcounted per the number of switchdev ports beneath that LAG, we haven't escaped the need to iterate through the LAG's lower interfaces. Except that is now the responsibility of the switchdev driver, because the replication helper just stopped half-way. So, even though yes, FDB events on LAG bridge ports must be special-cased, in the end it's simpler to let switchdev_handle_fdb_* just iterate through the LAG port's switchdev lowers, and let the switchdev driver figure out that those physical ports are under a LAG. The switchdev_handle_fdb_event_to_device() helper takes a "foreign_dev_check" callback so it can figure out whether @dev can autonomously forward to @foreign_dev. DSA fills this method properly: if the LAG is offloaded by another port in the same tree as @dev, then it isn't foreign. If it is a software LAG, it is foreign - forwarding happens in software. Whether an interface is foreign or not decides whether the replication helper will go through the LAG's switchdev lowers or not. Since the lan966x doesn't properly fill this out, FDB events on software LAG uppers will get called. By changing lan966x_foreign_dev_check(), we can suppress them. Whereas DSA will now start receiving FDB events for its offloaded LAG uppers, so we need to return -EOPNOTSUPP, since we currently don't do the right thing for them. Cc: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:31:43 -08:00
Vladimir Oltean	dedd6a009f	net: dsa: create a dsa_lag structure The main purpose of this change is to create a data structure for a LAG as seen by DSA. This is similar to what we have for bridging - we pass a copy of this structure by value to ->port_lag_join and ->port_lag_leave. For now we keep the lag_dev, id and a reference count in it. Future patches will add a list of FDB entries for the LAG (these also need to be refcounted to work properly). The LAG structure is created using dsa_port_lag_create() and destroyed using dsa_port_lag_destroy(), just like we have for bridging. Because now, the dsa_lag itself is refcounted, we can simplify dsa_lag_map() and dsa_lag_unmap(). These functions need to keep a LAG in the dst->lags array only as long as at least one port uses it. The refcounting logic inside those functions can be removed now - they are called only when we should perform the operation. dsa_lag_dev() is renamed to dsa_lag_by_id() and now returns the dsa_lag structure instead of the lag_dev net_device. dsa_lag_foreach_port() now takes the dsa_lag structure as argument. dst->lags holds an array of dsa_lag structures. dsa_lag_map() now also saves the dsa_lag->id value, so that linear walking of dst->lags in drivers using dsa_lag_id() is no longer necessary. They can just look at lag.id. dsa_port_lag_id_get() is a helper, similar to dsa_port_bridge_num_get(), which can be used by drivers to get the LAG ID assigned by DSA to a given port. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:31:43 -08:00
Vladimir Oltean	b99dbdf00b	net: dsa: mv88e6xxx: use dsa_switch_for_each_port in mv88e6xxx_lag_sync_masks Make the intent of the code more clear by using the dedicated helper for iterating over the ports of a switch. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:31:43 -08:00
Vladimir Oltean	3d4a0a2a46	net: dsa: make LAG IDs one-based The DSA LAG API will be changed to become more similar with the bridge data structures, where struct dsa_bridge holds an unsigned int num, which is generated by DSA and is one-based. We have a similar thing going with the DSA LAG, except that isn't stored anywhere, it is calculated dynamically by dsa_lag_id() by iterating through dst->lags. The idea of encoding an invalid (or not requested) LAG ID as zero for the purpose of simplifying checks in drivers means that the LAG IDs passed by DSA to drivers need to be one-based too. So back-and-forth conversion is needed when indexing the dst->lags array, as well as in drivers which assume a zero-based index. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:31:42 -08:00
Vladimir Oltean	066ce9779c	net: dsa: qca8k: rename references to "lag" as "lag_dev" In preparation of converting struct net_device dp->lag_dev into a struct dsa_lag dp->lag, we need to rename, for consistency purposes, all occurrences of the "lag" variable in qca8k to "lag_dev". Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:31:42 -08:00
Vladimir Oltean	e23eba7228	net: dsa: mv88e6xxx: rename references to "lag" as "lag_dev" In preparation of converting struct net_device dp->lag_dev into a struct dsa_lag dp->lag, we need to rename, for consistency purposes, all occurrences of the "lag" variable in mv88e6xxx to "lag_dev". Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:31:42 -08:00
Vladimir Oltean	46a76724e4	net: dsa: rename references to "lag" as "lag_dev" In preparation of converting struct net_device dp->lag_dev into a struct dsa_lag dp->lag, we need to rename, for consistency purposes, all occurrences of the "lag" variable in the DSA core to "lag_dev". Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:31:42 -08:00
Oleksij Rempel	89183b6ea8	net: asix: remove code duplicates in asix_mdio_read/write and asix_mdio_read/write_nopm This functions are mostly same except of one hard coded "in_pm" variable. So, rework them to reduce maintenance overhead. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20220223110633.3006551-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:21:30 -08:00
Yang Yingliang	37f40f81e5	net: marvell: prestera: Fix return value check in prestera_kern_fib_cache_find() rhashtable_lookup_fast() returns NULL pointer not ERR_PTR(), so it can return fib_node directly in prestera_kern_fib_cache_find(). Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220223084954.1771075-2-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:21:27 -08:00
Yang Yingliang	d434ee9dee	net: marvell: prestera: Fix return value check in prestera_fib_node_find() rhashtable_lookup_fast() returns NULL pointer not ERR_PTR(), so it can return fib_node directly in prestera_fib_node_find(). Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220223084954.1771075-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:21:20 -08:00
Casper Andersson	06388a03d2	net: sparx5: Support offloading of bridge port flooding flags Though the SparX-5i can control IPv4/6 multicasts separately from non-IP multicasts, these are all muxed onto the bridge's BR_MCAST_FLOOD flag. Signed-off-by: Casper Andersson <casper.casan@gmail.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20220223082700.qrot7lepwqcdnyzw@wse-c0155 Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:17:20 -08:00
Jakub Kicinski	aaa25a2fa7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net tools/testing/selftests/net/mptcp/mptcp_join.sh `34aa6e3bcc` ("selftests: mptcp: add ip mptcp wrappers") `857898eb4b` ("selftests: mptcp: add missing join check") `6ef84b1517` ("selftests: mptcp: more robust signal race test") https://lore.kernel.org/all/20220221131842.468893-1-broonie@kernel.org/ drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/act.h drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c `fb7e76ea3f` ("net/mlx5e: TC, Skip redundant ct clear actions") `c63741b426` ("net/mlx5e: Fix MPLSoUDP encap to use MPLS action information") `09bf979232` ("net/mlx5e: TC, Move pedit_headers_action to parse_attr") `84ba8062e3` ("net/mlx5e: Test CT and SAMPLE on flow attr") `efe6f961cd` ("net/mlx5e: CT, Don't set flow flag CT for ct clear flow") `3b49a7edec` ("net/mlx5e: TC, Reject rules with multiple CT actions") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 17:54:25 -08:00

1 2 3 4 5 ...

1075241 Commits All Branches Search

1075241 Commits

All Branches