OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Vladimir Oltean	d5f19486ce	net: dsa: listen for SWITCHDEV_{FDB,DEL}_ADD_TO_DEVICE on foreign bridge neighbors Some DSA switches (and not only) cannot learn source MAC addresses from packets injected from the CPU. They only perform hardware address learning from inbound traffic. This can be problematic when we have a bridge spanning some DSA switch ports and some non-DSA ports (which we'll call "foreign interfaces" from DSA's perspective). There are 2 classes of problems created by the lack of learning on CPU-injected traffic: - excessive flooding, due to the fact that DSA treats those addresses as unknown - the risk of stale routes, which can lead to temporary packet loss To illustrate the second class, consider the following situation, which is common in production equipment (wireless access points, where there is a WLAN interface and an Ethernet switch, and these form a single bridging domain). AP 1: +------------------------------------------------------------------------+ \| br0 \| +------------------------------------------------------------------------+ +------------+ +------------+ +------------+ +------------+ +------------+ \| swp0 \| \| swp1 \| \| swp2 \| \| swp3 \| \| wlan0 \| +------------+ +------------+ +------------+ +------------+ +------------+ \| ^ ^ \| \| \| \| \| \| \| Client A Client B \| \| \| +------------+ +------------+ +------------+ +------------+ +------------+ \| swp0 \| \| swp1 \| \| swp2 \| \| swp3 \| \| wlan0 \| +------------+ +------------+ +------------+ +------------+ +------------+ +------------------------------------------------------------------------+ \| br0 \| +------------------------------------------------------------------------+ AP 2 - br0 of AP 1 will know that Clients A and B are reachable via wlan0 - the hardware fdb of a DSA switch driver today is not kept in sync with the software entries on other bridge ports, so it will not know that clients A and B are reachable via the CPU port UNLESS the hardware switch itself performs SA learning from traffic injected from the CPU. Nonetheless, a substantial number of switches don't. - the hardware fdb of the DSA switch on AP 2 may autonomously learn that Client A and B are reachable through swp0. Therefore, the software br0 of AP 2 also may or may not learn this. In the example we're illustrating, some Ethernet traffic has been going on, and br0 from AP 2 has indeed learnt that it can reach Client B through swp0. One of the wireless clients, say Client B, disconnects from AP 1 and roams to AP 2. The topology now looks like this: AP 1: +------------------------------------------------------------------------+ \| br0 \| +------------------------------------------------------------------------+ +------------+ +------------+ +------------+ +------------+ +------------+ \| swp0 \| \| swp1 \| \| swp2 \| \| swp3 \| \| wlan0 \| +------------+ +------------+ +------------+ +------------+ +------------+ \| ^ \| \| \| Client A \| \| \| Client B \| \| \| v +------------+ +------------+ +------------+ +------------+ +------------+ \| swp0 \| \| swp1 \| \| swp2 \| \| swp3 \| \| wlan0 \| +------------+ +------------+ +------------+ +------------+ +------------+ +------------------------------------------------------------------------+ \| br0 \| +------------------------------------------------------------------------+ AP 2 - br0 of AP 1 still knows that Client A is reachable via wlan0 (no change) - br0 of AP 1 will (possibly) know that Client B has left wlan0. There are cases where it might never find out though. Either way, DSA today does not process that notification in any way. - the hardware FDB of the DSA switch on AP 1 may learn autonomously that Client B can be reached via swp0, if it receives any packet with Client 1's source MAC address over Ethernet. - the hardware FDB of the DSA switch on AP 2 still thinks that Client B can be reached via swp0. It does not know that it has roamed to wlan0, because it doesn't perform SA learning from the CPU port. Now Client A contacts Client B. AP 1 routes the packet fine towards swp0 and delivers it on the Ethernet segment. AP 2 sees a frame on swp0 and its fdb says that the destination is swp0. Hairpinning is disabled => drop. This problem comes from the fact that these switches have a 'blind spot' for addresses coming from software bridging. The generic solution is not to assume that hardware learning can be enabled somehow, but to listen to more bridge learning events. It turns out that the bridge driver does learn in software from all inbound frames, in __br_handle_local_finish. A proper SWITCHDEV_FDB_ADD_TO_DEVICE notification is emitted for the addresses serviced by the bridge on 'foreign' interfaces. The software bridge also does the right thing on migration, by notifying that the old entry is deleted, so that does not need to be special-cased in DSA. When it is deleted, we just need to delete our static FDB entry towards the CPU too, and wait. The problem is that DSA currently only cares about SWITCHDEV_FDB_ADD_TO_DEVICE events received on its own interfaces, such as static FDB entries. Luckily we can change that, and DSA can listen to all switchdev FDB add/del events in the system and figure out if those events were emitted by a bridge that spans at least one of DSA's own ports. In case that is true, DSA will also offload that address towards its own CPU port, in the eventuality that there might be bridge clients attached to the DSA switch who want to talk to the station connected to the foreign interface. In terms of implementation, we need to keep the fdb_info->added_by_user check for the case where the switchdev event was targeted directly at a DSA switch port. But we don't need to look at that flag for snooped events. So the check is currently too late, we need to move it earlier. This also simplifies the code a bit, since we avoid uselessly allocating and freeing switchdev_work. We could probably do some improvements in the future. For example, multi-bridge support is rudimentary at the moment. If there are two bridges spanning a DSA switch's ports, and both of them need to service the same MAC address, then what will happen is that the migration of one of those stations will trigger the deletion of the FDB entry from the CPU port while it is still used by other bridge. That could be improved with reference counting but is left for another time. This behavior needs to be enabled at driver level by setting ds->assisted_learning_on_cpu_port = true. This is because we don't want to inflict a potential performance penalty (accesses through MDIO/I2C/SPI are expensive) to hardware that really doesn't need it because address learning on the CPU port works there. Reported-by: DENG Qingfang <dqfext@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 15:34:46 -08:00
Vladimir Oltean	5fb4a451a8	net: dsa: exit early in dsa_slave_switchdev_event if we can't program the FDB Right now, the following would happen for a switch driver that does not implement .port_fdb_add or .port_fdb_del. dsa_slave_switchdev_event returns NOTIFY_OK and schedules: -> dsa_slave_switchdev_event_work -> dsa_port_fdb_add -> dsa_port_notify(DSA_NOTIFIER_FDB_ADD) -> dsa_switch_fdb_add -> if (!ds->ops->port_fdb_add) return -EOPNOTSUPP; -> an error is printed with dev_dbg, and dsa_fdb_offload_notify(switchdev_work) is not called. We can avoid scheduling the worker for nothing and say NOTIFY_DONE. Because we don't call dsa_fdb_offload_notify, the static FDB entry will remain just in the software bridge. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 15:34:46 -08:00
Vladimir Oltean	447d290a58	net: dsa: move switchdev event implementation under the same switch/case statement We'll need to start listening to SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE events even for interfaces where dsa_slave_dev_check returns false, so we need that check inside the switch-case statement for SWITCHDEV_FDB_*. This movement also avoids a useless allocation / free of switchdev_work on the untreated "default event" case. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 15:34:46 -08:00
Vladimir Oltean	c4bb76a9a0	net: dsa: don't use switchdev_notifier_fdb_info in dsa_switchdev_event_work Currently DSA doesn't add FDB entries on the CPU port, because it only does so through switchdev, which is associated with a net_device, and there are none of those for the CPU port. But actually FDB addresses on the CPU port have some use cases of their own, if the switchdev operations are initiated from within the DSA layer. There is just one problem with the existing code: it passes a structure in dsa_switchdev_event_work which was retrieved directly from switchdev, so it contains a net_device. We need to generalize the contents to something that covers the CPU port as well: the "ds, port" tuple is fine for that. Note that the new procedure for notifying the successful FDB offload is inspired from the rocker model. Also, nothing was being done if added_by_user was false. Let's check for that a lot earlier, and don't actually bother to schedule the worker for nothing. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 15:34:45 -08:00
Vladimir Oltean	2fd186501b	net: dsa: be louder when a non-legacy FDB operation fails The dev_close() call was added in commit `c9eb3e0f87` ("net: dsa: Add support for learning FDB through notification") "to indicate inconsistent situation" when we could not delete an FDB entry from the port. bridge fdb del d8:58:d7:00:ca:6d dev swp0 self master It is a bit drastic and at the same time not helpful if the above fails to only print with netdev_dbg log level, but on the other hand to bring the interface down. So increase the verbosity of the error message, and drop dev_close(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 15:34:45 -08:00
Vladimir Oltean	90dc8fd360	net: bridge: notify switchdev of disappearance of old FDB entry upon migration Currently the bridge emits atomic switchdev notifications for dynamically learnt FDB entries. Monitoring these notifications works wonders for switchdev drivers that want to keep their hardware FDB in sync with the bridge's FDB. For example station A wants to talk to station B in the diagram below, and we are concerned with the behavior of the bridge on the DUT device: DUT +-------------------------------------+ \| br0 \| \| +------+ +------+ +------+ +------+ \| \| \| \| \| \| \| \| \| \| \| \| \| swp0 \| \| swp1 \| \| swp2 \| \| eth0 \| \| +-------------------------------------+ \| \| \| Station A \| \| \| \| +--+------+--+ +--+------+--+ \| \| \| \| \| \| \| \| \| \| swp0 \| \| \| \| swp0 \| \| Another \| +------+ \| \| +------+ \| Another switch \| br0 \| \| br0 \| switch \| +------+ \| \| +------+ \| \| \| \| \| \| \| \| \| \| \| swp1 \| \| \| \| swp1 \| \| +--+------+--+ +--+------+--+ \| Station B Interfaces swp0, swp1, swp2 are handled by a switchdev driver that has the following property: frames injected from its control interface bypass the internal address analyzer logic, and therefore, this hardware does not learn from the source address of packets transmitted by the network stack through it. So, since bridging between eth0 (where Station B is attached) and swp0 (where Station A is attached) is done in software, the switchdev hardware will never learn the source address of Station B. So the traffic towards that destination will be treated as unknown, i.e. flooded. This is where the bridge notifications come in handy. When br0 on the DUT sees frames with Station B's MAC address on eth0, the switchdev driver gets these notifications and can install a rule to send frames towards Station B's address that are incoming from swp0, swp1, swp2, only towards the control interface. This is all switchdev driver private business, which the notification makes possible. All is fine until someone unplugs Station B's cable and moves it to the other switch: DUT +-------------------------------------+ \| br0 \| \| +------+ +------+ +------+ +------+ \| \| \| \| \| \| \| \| \| \| \| \| \| swp0 \| \| swp1 \| \| swp2 \| \| eth0 \| \| +-------------------------------------+ \| \| \| Station A \| \| \| \| +--+------+--+ +--+------+--+ \| \| \| \| \| \| \| \| \| \| swp0 \| \| \| \| swp0 \| \| Another \| +------+ \| \| +------+ \| Another switch \| br0 \| \| br0 \| switch \| +------+ \| \| +------+ \| \| \| \| \| \| \| \| \| \| \| swp1 \| \| \| \| swp1 \| \| +--+------+--+ +--+------+--+ \| Station B Luckily for the use cases we care about, Station B is noisy enough that the DUT hears it (on swp1 this time). swp1 receives the frames and delivers them to the bridge, who enters the unlikely path in br_fdb_update of updating an existing entry. It moves the entry in the software bridge to swp1 and emits an addition notification towards that. As far as the switchdev driver is concerned, all that it needs to ensure is that traffic between Station A and Station B is not forever broken. If it does nothing, then the stale rule to send frames for Station B towards the control interface remains in place. But Station B is no longer reachable via the control interface, but via a port that can offload the bridge port learning attribute. It's just that the port is prevented from learning this address, since the rule overrides FDB updates. So the rule needs to go. The question is via what mechanism. It sure would be possible for this switchdev driver to keep track of all addresses which are sent to the control interface, and then also listen for bridge notifier events on its own ports, searching for the ones that have a MAC address which was previously sent to the control interface. But this is cumbersome and inefficient. Instead, with one small change, the bridge could notify of the address deletion from the old port, in a symmetrical manner with how it did for the insertion. Then the switchdev driver would not be required to monitor learn/forget events for its own ports. It could just delete the rule towards the control interface upon bridge entry migration. This would make hardware address learning be possible again. Then it would take a few more packets until the hardware and software FDB would be in sync again. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 15:34:45 -08:00
Jakub Kicinski	0565ff56cd	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Alexei Starovoitov says: ==================== pull-request: bpf 2021-01-07 We've added 4 non-merge commits during the last 10 day(s) which contain a total of 4 files changed, 14 insertions(+), 7 deletions(-). The main changes are: 1) Fix task_iter bug caused by the merge conflict resolution, from Yonghong. 2) Fix resolve_btfids for multiple type hierarchies, from Jiri. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpftool: Fix compilation failure for net.o with older glibc tools/resolve_btfids: Warn when having multiple IDs for single type bpf: Fix a task_iter bug caused by a merge conflict resolution selftests/bpf: Fix a compile error for BPF_F_BPRM_SECUREEXEC ==================== Link: https://lore.kernel.org/r/20210107221555.64959-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 15:10:27 -08:00
Jakub Kicinski	dd15c4a0ba	Merge branch 'r8169-improve-rtl8168g-phy-suspend-quirk' Heiner Kallweit says: ==================== r8169: improve RTL8168g PHY suspend quirk According to Realtek the ERI register 0x1a8 quirk is needed to work around a hw issue with the PHY on RTL8168g. The register needs to be changed before powering down the PHY. Currently we don't meet this requirement, however I'm not aware of any problems caused by this. Therefore I see the change as an improvement. The PHY driver has no means to access the chip ERI registers, therefore we have to intercept MDIO writes to the BMCR register. If the BMCR_PDOWN bit is going to be set, then let's apply the quirk before actually powering down the PHY. ==================== Link: https://lore.kernel.org/r/9303c2cf-c521-beea-c09f-63b5dfa91b9c@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 14:56:36 -08:00
Heiner Kallweit	acb58657c8	r8169: improve RTL8168g PHY suspend quirk According to Realtek the ERI register 0x1a8 quirk is needed to work around a hw issue with the PHY on RTL8168g. The register needs to be changed before powering down the PHY. Currently we don't meet this requirement, however I'm not aware of any problems caused by this. Therefore I see the change as an improvement. The PHY driver has no means to access the chip ERI registers, therefore we have to intercept MDIO writes to BMCR register. If the BMCR_PDOWN bit is going to be set, then let's apply the quirk before actually powering down the PHY. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 14:56:33 -08:00
Heiner Kallweit	c6cff9dfeb	r8169: move ERI access functions to avoid forward declaration No functional change here. We just move a code block to avoid a function forward declaration in a subsequent change. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 14:56:33 -08:00
Heiner Kallweit	e6e918d4eb	net: phy: replace mutex_is_locked with lockdep_assert_held in phylib Switch to lockdep_assert_held(_once), similar to what is being done in other subsystems. One advantage is that there's zero runtime overhead if lockdep support isn't enabled. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/ccc40b9d-8ee0-43a1-5009-2cc95ca79c85@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 14:53:07 -08:00
Florian Fainelli	8b86850bf9	net: phy: bcm7xxx: Add an entry for BCM72116 BCM72116 features a 28nm integrated EPHY, add an entry to match this PHY OUI. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20210106170944.1253046-1-f.fainelli@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 14:46:09 -08:00
Jakub Kicinski	704a0f858e	Merge branch 'net-fix-netfilter-defrag-ip-tunnel-pmtu-blackhole' Florian Westphal says: ==================== net: fix netfilter defrag/ip tunnel pmtu blackhole Christian Perle reported a PMTU blackhole due to unexpected interaction between the ip defragmentation that comes with connection tracking and ip tunnels. Unfortunately setting 'nopmtudisc' on the tunnel breaks the test scenario even without netfilter. Christinas setup looks like this: +--------+ +---------+ +--------+ \|Router A\|-------\|Wanrouter\|-------\|Router B\| \| \|.IPIP..\| \|..IPIP.\| \| +--------+ +---------+ +--------+ / mtu 1400 \ / \ +--------+ +--------+ \|Client A\| \|Client B\| +--------+ +--------+ MTU is 1500 everywhere, except on Router A to Wanrouter and Wanrouter to Router B. Router A and Router B use IPIP tunnel interfaces to tunnel traffic between Client A and Client B over WAN. Client A sends a 1400 byte UDP datagram to Client B. This packet gets encapsulated in the IPIP tunnel. This works, packet is received on client B. When conntrack (or anything else that forces ip defragmentation) is enabled on Router A, the packet gets dropped on Router A after encapsulation because they exceed the link MTU. Setting the 'nopmtudisc' flag on the IPIP tunnel makes things worse, no packets pass even in the no-netfilter scenario. Patch one is a reproducer script for selftest infra. Patch two is a fix for 'nopmtudisc' behaviour so ip_tunnel will send an icmp error to Client A. This allows 'nopmtudisc' tunnel to forward the UDP datagrams. Patch three enables ip refragmentation for all reassembled packets, just like ipv6. ==================== Link: https://lore.kernel.org/r/20210105231523.622-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 14:42:37 -08:00
Florian Westphal	bb4cc1a188	net: ip: always refragment ip defragmented packets Conntrack reassembly records the largest fragment size seen in IPCB. However, when this gets forwarded/transmitted, fragmentation will only be forced if one of the fragmented packets had the DF bit set. In that case, a flag in IPCB will force fragmentation even if the MTU is large enough. This should work fine, but this breaks with ip tunnels. Consider client that sends a UDP datagram of size X to another host. The client fragments the datagram, so two packets, of size y and z, are sent. DF bit is not set on any of these packets. Middlebox netfilter reassembles those packets back to single size-X packet, before routing decision. packet-size-vs-mtu checks in ip_forward are irrelevant, because DF bit isn't set. At output time, ip refragmentation is skipped as well because x is still smaller than the mtu of the output device. If ttransmit device is an ip tunnel, the packet size increases to x+overhead. Also, tunnel might be configured to force DF bit on outer header. In this case, packet will be dropped (exceeds MTU) and an ICMP error is generated back to sender. But sender already respects the announced MTU, all the packets that it sent did fit the announced mtu. Force refragmentation as per original sizes unconditionally so ip tunnel will encapsulate the fragments instead. The only other solution I see is to place ip refragmentation in the ip_tunnel code to handle this case. Fixes: `d6b915e29f` ("ip_fragment: don't forward defragmented DF packet") Reported-by: Christian Perle <christian.perle@secunet.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 14:42:36 -08:00
Florian Westphal	50c661670f	net: fix pmtu check in nopmtudisc mode For some reason ip_tunnel insist on setting the DF bit anyway when the inner header has the DF bit set, EVEN if the tunnel was configured with 'nopmtudisc'. This means that the script added in the previous commit cannot be made to work by adding the 'nopmtudisc' flag to the ip tunnel configuration. Doing so breaks connectivity even for the without-conntrack/netfilter scenario. When nopmtudisc is set, the tunnel will skip the mtu check, so no icmp error is sent to client. Then, because inner header has DF set, the outer header gets added with DF bit set as well. IP stack then sends an error to itself because the packet exceeds the device MTU. Fixes: `23a3647bc4` ("ip_tunnels: Use skb-len to PMTU check.") Cc: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 14:42:36 -08:00
Florian Westphal	9e7a67dee2	selftests: netfilter: add selftest for ipip pmtu discovery with enabled connection tracking Convert Christians bug description into a reproducer. Cc: Shuah Khan <shuah@kernel.org> Reported-by: Christian Perle <christian.perle@secunet.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 14:42:31 -08:00
Jakub Kicinski	0b86235d8f	Merge branch 'udp_tunnel_nic-post-conversion-cleanup' udp_tunnel_nic: post conversion cleanup It has been two releases since we added the common infra for UDP tunnel port offload, and we have not heard of any major issues. Remove the old direct driver NDOs completely, and perform minor simplifications in the tunnel drivers. Link: https://lore.kernel.org/r/20210106210637.1839662-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 12:53:32 -08:00
Jakub Kicinski	b9ef3fecd1	udp_tunnel: reshuffle NETIF_F_RX_UDP_TUNNEL_PORT checks Move the NETIF_F_RX_UDP_TUNNEL_PORT feature check into udp_tunnel_nic_*_port() helpers, since they're always done right before the call. Add similar checks before calling the notifier. udp_tunnel_nic invokes the notifier without checking features which could result in some wasted cycles. Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 12:53:29 -08:00
Jakub Kicinski	30bfce1094	net: remove ndo_udp_tunnel_* callbacks All UDP tunnel port management is now routed via udp_tunnel_nic infra directly. Remove the old callbacks. Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 12:53:29 -08:00
Jakub Kicinski	dedc33e7df	udp_tunnel: remove REGISTER/UNREGISTER handling from tunnel drivers udp_tunnel_nic handles REGISTER and UNREGISTER event, now that all drivers use that infra we can drop the event handling in the tunnel drivers. Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 12:53:29 -08:00
Jakub Kicinski	876c4384ae	udp_tunnel: hard-wire NDOs to udp_tunnel_nic__port() helpers All drivers use udp_tunnel_nic__port() helpers, prepare for NDO removal by invoking those helpers directly. The helpers are safe to call on all devices, they check if device has the UDP tunnel state initialized. Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 12:53:29 -08:00
Florian Fainelli	ddb4d32ed6	net: broadcom: Drop OF dependency from BGMAC_PLATFORM All of the OF code that is used has stubbed and will compile and link just fine, keeping COMPILE_TEST is enough. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20210106191546.1358324-1-f.fainelli@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 12:51:15 -08:00
Lukas Bulwahn	f3562f5e00	docs: octeontx2: tune rst markup Commit `80b9414832` ("docs: octeontx2: Add Documentation for NPA health reporters") added new documentation with improper formatting for rst, and caused a few new warnings for make htmldocs in octeontx2.rst:169--202. Tune markup and formatting for better presentation in the HTML view. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: George Cherian <george.cherian@marvell.com> Link: https://lore.kernel.org/r/20210106161735.21751-1-lukas.bulwahn@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 12:47:31 -08:00
Jakub Kicinski	c61ce06f3e	Merge branch 'bcm63xx_enet-major-makeover-of-driver' Sieng Piaw Liew says: ==================== bcm63xx_enet: major makeover of driver This patch series aim to improve the bcm63xx_enet driver by integrating the latest networking features, i.e. batched rx processing, BQL, build_skb, etc. The newer enetsw SoCs are found to be able to do unaligned rx DMA by adding NET_IP_ALIGN padding which, combined with these patches, improved packet processing performance by ~50% on BCM6328. Older non-enetsw SoCs still benefit mainly from rx batching. Performance improvement of ~30% is observed on BCM6333. The BCM63xx SoCs are designed for routers. As such, having BQL is beneficial as well as trivial to add. v3: * Simplify xmit_more patch by not moving around the code needlessly. * Fix indentation in xmit_more patch. * Fix indentation in build_skb patch. * Split rx ring cleanup patch from build_skb patch and precede build_skb patch for better understanding, as suggested by Florian Fainelli. v2: * Add xmit_more support and rx loop improvisation patches. * Moved BQL netdev_reset_queue() to bcm_enet_stop()/bcm_enetsw_stop() functions as suggested by Florian Fainelli. * Improved commit messages. ==================== Link: https://lore.kernel.org/r/20210106144208.1935-1-liew.s.piaw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 12:39:56 -08:00
Sieng Piaw Liew	ae2259eebe	bcm63xx_enet: improve rx loop Use existing rx processed count to track against budget, thereby making budget decrement operation redundant. rx_desc_count can be calculated outside the rx loop, making the loop a bit smaller. Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 12:39:53 -08:00
Sieng Piaw Liew	d27de0ef5e	bcm63xx_enet: convert to build_skb We can increase the efficiency of rx path by using buffers to receive packets then build SKBs around them just before passing into the network stack. In contrast, preallocating SKBs too early reduces CPU cache efficiency. Check if we're in NAPI context when refilling RX. Normally we're almost always running in NAPI context. Dispatch to napi_alloc_frag directly instead of relying on netdev_alloc_frag which does the same but with the overhead of local_bh_disable/enable. Tested on BCM6328 320 MHz and iperf3 -M 512 to measure packet/sec performance. Included netif_receive_skb_list and NET_IP_ALIGN optimizations. Before: [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 49.9 MBytes 41.9 Mbits/sec 197 sender [ 4] 0.00-10.00 sec 49.3 MBytes 41.3 Mbits/sec receiver After: [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-30.00 sec 171 MBytes 47.8 Mbits/sec 272 sender [ 4] 0.00-30.00 sec 170 MBytes 47.6 Mbits/sec receiver Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 12:39:53 -08:00
Sieng Piaw Liew	3d0b72654b	bcm63xx_enet: consolidate rx SKB ring cleanup code The rx SKB ring use the same code for cleanup at various points. Combine them into a function to reduce lines of code. Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 12:39:53 -08:00
Sieng Piaw Liew	c4a207865e	bcm63xx_enet: alloc rx skb with NET_IP_ALIGN Use netdev_alloc_skb_ip_align on newer SoCs with integrated switch (enetsw) when refilling RX. Increases packet processing performance by 30% (with netif_receive_skb_list). Non-enetsw SoCs cannot function with the extra pad so continue to use the regular netdev_alloc_skb. Tested on BCM6328 320 MHz and iperf3 -M 512 to measure packet/sec performance. Before: [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-30.00 sec 120 MBytes 33.7 Mbits/sec 277 sender [ 4] 0.00-30.00 sec 120 MBytes 33.5 Mbits/sec receiver After (+netif_receive_skb_list): [ 4] 0.00-30.00 sec 155 MBytes 43.3 Mbits/sec 354 sender [ 4] 0.00-30.00 sec 154 MBytes 43.1 Mbits/sec receiver Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 12:39:53 -08:00
Sieng Piaw Liew	375281d3a6	bcm63xx_enet: add xmit_more support Support bulking hardware TX queue by using netdev_xmit_more(). Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 12:39:53 -08:00
Sieng Piaw Liew	4c59b0f554	bcm63xx_enet: add BQL support Add Byte Queue Limits support to reduce/remove bufferbloat in bcm63xx_enet. Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 12:39:53 -08:00
Sieng Piaw Liew	9cbfea02c1	bcm63xx_enet: batch process rx path Use netif_receive_skb_list to batch process rx skb. Tested on BCM6328 320 MHz using iperf3 -M 512, increasing performance by 12.5%. Before: [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-30.00 sec 120 MBytes 33.7 Mbits/sec 277 sender [ 4] 0.00-30.00 sec 120 MBytes 33.5 Mbits/sec receiver After: [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-30.00 sec 136 MBytes 37.9 Mbits/sec 203 sender [ 4] 0.00-30.00 sec 135 MBytes 37.7 Mbits/sec receiver Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 12:39:52 -08:00
Dinghao Liu	5b0bb12c58	net/mlx5e: Fix memleak in mlx5e_create_l2_table_groups When mlx5_create_flow_group() fails, ft->g should be freed just like when kvzalloc() fails. The caller of mlx5e_create_l2_table_groups() does not catch this issue on failure, which leads to memleak. Fixes: `33cfaaa8f3` ("net/mlx5e: Split the main flow steering table") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-01-07 12:22:51 -08:00
Dinghao Liu	7a6eb072a9	net/mlx5e: Fix two double free cases mlx5e_create_ttc_table_groups() frees ft->g on failure of kvzalloc(), but such failure will be caught by its caller in mlx5e_create_ttc_table() and ft->g will be freed again in mlx5e_destroy_flow_table(). The same issue also occurs in mlx5e_create_ttc_table_groups(). Set ft->g to NULL after kfree() to avoid double free. Fixes: `7b3722fa9e` ("net/mlx5e: Support RSS for GRE tunneled packets") Fixes: `33cfaaa8f3` ("net/mlx5e: Split the main flow steering table") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-01-07 12:22:50 -08:00
Leon Romanovsky	4d8be21112	net/mlx5: Release devlink object if adev fails Add missed freeing previously allocated devlink object. Fixes: `a925b5e309` ("net/mlx5: Register mlx5 devices to auxiliary virtual bus") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-01-07 12:22:50 -08:00
Aya Levin	b1c0aca3d3	net/mlx5e: ethtool, Fix restriction of autoneg with 56G Prior to this patch, configuring speed to 50G with autoneg off over devices supporting 50G per lane failed. Support for 50G per lane introduced a new set of link-modes, on which driver always performed a speed validation as if only legacy link-modes were configured. Fix driver speed validation to force setting autoneg over 56G only if in legacy link-mode. Fixes: `3d7cadae51` ("net/mlx5e: ethtool, Fix analysis of speed setting") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-01-07 12:22:50 -08:00
Maor Dickman	e13ed0ac06	net/mlx5e: In skb build skip setting mark in switchdev mode sop_drop_qpn field in the cqe is used by two features, in SWITCHDEV mode to restore the chain id in case of a miss and in LEGACY mode to support skbedit mark action. In build RX skb, the skb mark field is set regardless of the configured mode which cause a corruption of the mark field in case of switchdev mode. Fix by overriding the mark value back to 0 in the representor tc update skb flow. Fixes: `8f1e0b97cc` ("net/mlx5: E-Switch, Mark miss packets with new chain id mapping") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-01-07 12:22:50 -08:00
Alaa Hleihel	25c904b59a	net/mlx5: E-Switch, fix changing vf VLANID Adding vf VLANID for the first time, or after having cleared previously defined VLANID works fine, however, attempting to change an existing vf VLANID clears the rules on the firmware, but does not add new rules for the new vf VLANID. Fix this by changing the logic in function esw_acl_egress_lgcy_setup() so that it will always configure egress rules. Fixes: `ea651a86d4` ("net/mlx5: E-Switch, Refactor eswitch egress acl codes") Signed-off-by: Alaa Hleihel <alaa@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-01-07 12:22:49 -08:00
Moshe Shemesh	b544011f0e	net/mlx5e: Fix SWP offsets when vlan inserted by driver In case WQE includes inline header the vlan is inserted by driver even if vlan offload is set. On geneve over vlan interface where software parser is used the SWP offsets should be updated according to the added vlan. Fixes: `e3cfc7e6b7` ("net/mlx5e: TX, Add geneve tunnel stateless offload support") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-01-07 12:22:49 -08:00
Oz Shlomo	eed38eeee7	net/mlx5e: CT: Use per flow counter when CT flow accounting is enabled Connection counters may be shared for both directions when the counter is used for connection aging purposes. However, if TC flow accounting is enabled then a unique counter is required per direction. Instantiate a unique counter per direction if the conntrack accounting extension is enabled. Use a shared counter when the connection accounting extension is disabled. Fixes: `1edae2335a` ("net/mlx5e: CT: Use the same counter for both directions") Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-01-07 12:22:49 -08:00
Mark Zhang	0f2dcade69	net/mlx5: Use port_num 1 instead of 0 when delete a RoCE address In multi-port mode, FW reports syndrome 0x2ea48 (invalid vhca_port_number) if the port_num is not 1 or 2. Fixes: `80f09dfc23` ("net/mlx5: Eswitch, enable RoCE loopback traffic") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-01-07 12:22:48 -08:00
Aya Levin	9c9be85f6b	net/mlx5e: Add missing capability check for uplink follow Expose firmware indication that it supports setting eswitch uplink state to follow (follow the physical link). Condition setting the eswitch uplink admin-state with this capability bit. Older FW may not support the uplink state setting. Fixes: `7d0314b11c` ("net/mlx5e: Modify uplink state on interface up/down") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-01-07 12:22:48 -08:00
Mark Zhang	abf8ef953a	net/mlx5: Check if lag is supported before creating one This patch fixes a memleak issue by preventing to create a lag and add PFs if lag is not supported. comm “python3”, pid 349349, jiffies 4296985507 (age 1446.976s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ……………. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ……………. backtrace: [<000000005b216ae7>] mlx5_lag_add+0x1d5/0×3f0 [mlx5_core] [<000000000445aa55>] mlx5e_nic_enable+0x66/0×1b0 [mlx5_core] [<00000000c56734c3>] mlx5e_attach_netdev+0x16e/0×200 [mlx5_core] [<0000000030439d1f>] mlx5e_attach+0x5c/0×90 [mlx5_core] [<0000000018fd8615>] mlx5e_add+0x1a4/0×410 [mlx5_core] [<0000000068bc504b>] mlx5_add_device+0x72/0×120 [mlx5_core] [<000000009fce51f9>] mlx5_register_device+0x77/0xb0 [mlx5_core] [<00000000d0d81ff3>] mlx5_load_one+0xc58/0×1eb0 [mlx5_core] [<0000000045077adc>] init_one+0x3ea/0×920 [mlx5_core] [<0000000043287674>] pci_device_probe+0xcd/0×150 [<00000000dafd3279>] really_probe+0x1c9/0×4b0 [<00000000f06bdd84>] driver_probe_device+0x5d/0×140 [<00000000e3d508b6>] device_driver_attach+0x4f/0×60 [<0000000084fba0f0>] bind_store+0xbf/0×120 [<00000000bf6622b3>] kernfs_fop_write+0x114/0×1b0 Fixes: `9b412cc35f` ("net/mlx5e: Add LAG warning if bond slave is not lag master") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-01-07 12:22:48 -08:00
Linus Torvalds	f5e6c33025	spi: Fixes for v5.11 A couple of core fixes here, both to do with handling of drivers which don't report their maximum speed since we factored some of the handling for transfer speeds out into the core in the previous release. There's also some driver specific fixes, including a relatively large set for some races around timeouts in spi-geni-qcom. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAl/3NqwACgkQJNaLcl1U h9C04gf8DHM+rGA6XsQfxjlqPmu0KOq+QVfDEYvXEbpWIQ1YWyB3kK3H6ftvFMJz bWW1xBjJL6nav9pZwRllLu2P1hkXek0MpZFFRp00MrUbUW06ct103xiIGVmVPUMQ A4w44nt7WJj/k9fRyvAjrAp1LQ8pGl0H1m6kifZLpNm8GlUMrCyFz7kfWs6ojHOo uBfru37WFiSfgQeDVTBd9v0A7Nu/dEnMlHCbikifpwsB/Q0GTfs8GG4pIo+QehwQ VhOCi/x+p7JW3Tksp4DM6jawzpBCoTRt5EzNzbprIyPde3vzUj4rjiqkcAxgW3hu oiwkkjM5e3/0oVHebSkmRVuG2qlT8w== =hP9t -----END PGP SIGNATURE----- Merge tag 'spi-fix-v5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A couple of core fixes here, both to do with handling of drivers which don't report their maximum speed since we factored some of the handling for transfer speeds out into the core in the previous release. There's also some driver specific fixes, including a relatively large set for some races around timeouts in spi-geni-qcom" * tag 'spi-fix-v5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: fix the divide by 0 error when calculating xfer waiting time spi: Fix the clamping of spi->max_speed_hz spi: altera: fix return value for altera_spi_txrx() spi: stm32: FIFO threshold level - fix align packet size spi: spi-geni-qcom: Print an error when we timeout setting the CS spi: spi-geni-qcom: Don't try to set CS if an xfer is pending spi: spi-geni-qcom: Fail new xfers if xfer/cancel/abort pending spi: spi-geni-qcom: Fix geni_spi_isr() NULL dereference in timeout case	2021-01-07 12:21:32 -08:00
Linus Torvalds	a1a7b4f324	regulator: Fixes for v5.11 A few minor driver specific fixes, mostly DT bindings document bits, plus a new device ID. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAl/3Pe0ACgkQJNaLcl1U h9Cigwf/QVwn08GK8+fsRCRWT/VffFeK2u9QlgUvqRLy8gFoU/GE2GczbU7yOBmy ELYHxuXBBGXU2tX5o0xE280KzPsRN5IFHRpj5HOGvdtpzaL4ch2xomVttzXoPaCQ i16YcTD5AR+yzYhnY/SWExgVpNx5qbYdYEuYBHf0v1jTF/L+eH5YKYBJibgzMooy tfSNFM8Ie6d1b8UUbMETdaA7UHd0QIqphJj11DFsNJ4wfH+CqjMqI11hTImADTJK tng15k+I9R3kE57I79Cdzpcd2/yl4eyTNbRbqx7non1nMQMx0tsN/wQk5mD2JVXM uYY6Xb2enz2lloNCcGiQLFmgRvXWBw== =sZgX -----END PGP SIGNATURE----- Merge tag 'regulator-fix-v5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "A few minor driver specific fixes, mostly DT bindings document bits, plus a new device ID" * tag 'regulator-fix-v5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: qcom-rpmh: add QCOM_COMMAND_DB dependency regulator: qcom-rpmh-regulator: correct hfsmps515 definition dt-bindings: regulator: qcom,rpmh-regulator: add pm8009 revision regulator: bd718x7: Add enable times regulator: pf8x00: Use specific compatible strings for devices	2021-01-07 12:18:22 -08:00
Kristian Evensen	2e42338705	qmi_wwan: Increase headroom for QMAP SKBs When measuring the throughput (iperf3 + TCP) while routing on a not-so-powerful device (Mediatek MT7621, 880MHz CPU), I noticed that I achieved significantly lower speeds with QMI-based modems than for example a USB LAN dongle. The CPU was saturated in all of my tests. With the dongle I got ~300 Mbit/s, while I only measured ~200 Mbit/s with the modems. All offloads, etc. were switched off for the dongle, and I configured the modems to use QMAP (16k aggregation). The tests with the dongle were performed in my local (gigabit) network, while the LTE network the modems were connected to delivers 700-800 Mbit/s. Profiling the kernel revealed the cause of the performance difference. In qmimux_rx_fixup(), an SKB is allocated for each packet contained in the URB. This SKB has too little headroom, causing the check in skb_cow() (called from ip_forward()) to fail. pskb_expand_head() is then called and the SKB is reallocated. In the output from perf, I see that a significant amount of time is spent in pskb_expand_head() + support functions. In order to ensure that the SKB has enough headroom, this commit increases the amount of memory allocated in qmimux_rx_fixup() by LL_MAX_HEADER. The reason for using LL_MAX_HEADER and not a more accurate value, is that we do not know the type of the outgoing network interface. After making this change, I achieve the same throughput with the modems as with the dongle. Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Link: https://lore.kernel.org/r/20210106122403.1321180-1-kristian.evensen@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 12:05:55 -08:00
Sean Tranchetti	5316a7c013	tools: selftests: add test for changing routes with PTMU exceptions Adds new 2 new tests to the PTMU script: pmtu_ipv4/6_route_change. These tests explicitly test for a recently discovered problem in the IPv6 routing framework where PMTU exceptions were not properly released when replacing a route via "ip route change ...". After creating PMTU exceptions, the route from the device A to R1 will be replaced with a new route, then device A will be deleted. If the PMTU exceptions were properly cleaned up by the kernel, this device deletion will succeed. Otherwise, the unregistration of the device will stall, and messages such as the following will be logged in dmesg: unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 4 Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/1609892546-11389-2-git-send-email-stranche@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 12:03:36 -08:00
Sean Tranchetti	d8f5c29653	net: ipv6: fib: flush exceptions when purging route Route removal is handled by two code paths. The main removal path is via fib6_del_route() which will handle purging any PMTU exceptions from the cache, removing all per-cpu copies of the DST entry used by the route, and releasing the fib6_info struct. The second removal location is during fib6_add_rt2node() during a route replacement operation. This path also calls fib6_purge_rt() to handle cleaning up the per-cpu copies of the DST entries and releasing the fib6_info associated with the older route, but it does not flush any PMTU exceptions that the older route had. Since the older route is removed from the tree during the replacement, we lose any way of accessing it again. As these lingering DSTs and the fib6_info struct are holding references to the underlying netdevice struct as well, unregistering that device from the kernel can never complete. Fixes: `2b760fcf5c` ("ipv6: hook up exception table to store dst cache") Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/1609892546-11389-1-git-send-email-stranche@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 12:03:16 -08:00
Linus Torvalds	fc37784dc7	regmap: Fixes for v5.11 A couple of small fixes for leaks when attaching a device to a preexisting regmap. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAl/3OOIACgkQJNaLcl1U h9BHLwf/Q5iRceYjgTZT3EbDqtkf4eX60dwoaoCdSBOtzdzgX1kYdFI2xTeqjMTh sXKTA/OfapyWQrPIWW4vx5jkrum9Wz7hoQJhxp6jLl/lSj2c1TR3U7Nx+b2OA4Ap mPrdlfxHF47jZtsjfPtVUwNyb456ZL94H0IX/kefIUVp8uQusDFBoYLefi+T3+PA cRKlJqpcPpq2amZnXYCRhi8FpFPhFZezfAiddBzpab3xE+RFihgWl9NV2kuk99nR 9SD5ZhoDQF5IQnRXhfHGp2vhXFTk+BgpDuj+TFsF9TkWNgH8gtEQwmskvf7MJFvd Bihwcb8I7Me3ZPE7NI+8YqJEHLXnJg== =2RSs -----END PGP SIGNATURE----- Merge tag 'regmap-fix-v5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap fixes from Mark Brown: "A couple of small fixes for leaks when attaching a device to a preexisting regmap" * tag 'regmap-fix-v5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: debugfs: Fix a reversed if statement in regmap_debugfs_init() regmap: debugfs: Fix a memory leak when calling regmap_attach_dev	2021-01-07 11:57:56 -08:00
Jakub Kicinski	c8c748fb83	linux-can-fixes-for-5.11-20210107 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAl/23DITHG1rbEBwZW5n dXRyb25peC5kZQAKCRCpyVqK+u3vqcz4CACO2NjIJ6YwQhId/W6iIpmA/tjCwnDZ 3aMtwM6kmNmW1bUqemfThwwznD/f+G9JBo5WH+jY+Hx1YPm3Xeb/H7vx/kve/9Bl LdknrYbo5CBIAw20FPMwhmu3+4jR0d5dVcjHliHY8IgyX0sDXlgQs29HD0QlVNN0 /B1gS/OBAktVvIkqodMRie0SiR4iNz5ruqERfDNkauA8o4RQN6G/F5LUpUr/xNia zV/2zSiKSP7Jtni207qiBaXIuUgvudA+4N0ZxNRtWNsZA1fOg4LdOK50Ru5UDk6N 2lPYaiE5Eq9r9cPKhzK5BxuAoqGmmR033Rvg3RAqLExlk6bzTBiOHq9D =KPJM -----END PGP SIGNATURE----- Merge tag 'linux-can-fixes-for-5.11-20210107' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2021-01-07 The first patch is by me for the m_can driver and removes an erroneous m_can_clk_stop() from the driver's unregister function. The second patch targets the tcan4x5x driver, is by me, and fixes the bit timing constant parameters. The next two patches are by me, target the mcp251xfd driver, and fix a race condition in the optimized TEF path (which was added in net-next for v5.11). The similar code in the RX path is changed to look the same, although it doesn't suffer from the race condition. A patch by Lad Prabhakar updates the description and help text for the rcar CAN driver to reflect all supported SoCs. In the last patch Sriram Dash transfers the maintainership of the m_can driver to Pankaj Sharma. * tag 'linux-can-fixes-for-5.11-20210107' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: MAINTAINERS: Update MCAN MMIO device driver maintainer can: rcar: Kconfig: update help description for CAN_RCAR config can: mcp251xfd: mcp251xfd_handle_rxif_ring(): first increment RX tail pointer in HW, then in driver can: mcp251xfd: mcp251xfd_handle_tefif(): fix TEF vs. TX race condition can: tcan4x5x: fix bittiming const, use common bittiming from m_can driver can: m_can: m_can_class_unregister(): remove erroneous m_can_clk_stop() ==================== Link: https://lore.kernel.org/r/20210107103451.183477-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 11:08:08 -08:00
Jakub Kicinski	c10b377ff6	linux-can-next-for-5.12-20210106 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAl/12xATHG1rbEBwZW5n dXRyb25peC5kZQAKCRCpyVqK+u3vqcB6B/4xTRQCOLF1BN59AdixqGf9gdXjIpQs R5h4WJXqh9OPS4XSckSVBVGYniT8KdfBLL0IY4NV8q2s6R6Yo70G3GTiaYXI4cX/ BaKMRcvX6p1i2qZxTtHDGYxucnKCd+yRwfWkL4GLsCNcmTWCkqUL+nGgx6UmS6eB 9fMHcLXiP9vSfJQUfAOLR33Bu2C0QslI5fX5hi8PIdKEz9cHGf3PAuaRvtWEWi3a 1Zi/iXrv/0XDcT+OE5/2tAU6n8slIA2MU/80r7KdgUrWXOfuueH/pqepYgVU0Nf2 eFH+gDYBlZucd4ef37TPIAdPTw3z4F9irOVI1iwRIWdK3aQ3jyinsVWF =0lW0 -----END PGP SIGNATURE----- Merge tag 'linux-can-next-for-5.12-20210106' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2021-01-06 The first 16 patches are by me and target the tcan4x5x SPI glue driver for the m_can CAN driver. First there are a several cleanup commits, then the SPI regmap part is converted to 8 bits per word, to make it possible to use that driver on SPI controllers that only support the 8 bit per word mode (such as the SPI cores on the raspberry pi). Oliver Hartkopp contributes a patch for the CAN_RAW protocol. The getsockopt() for CAN_RAW_FILTER is changed to return -ERANGE if the filterset does not fit into the provided user space buffer. The last two patches are by Joakim Zhang and add wakeup support to the flexcan driver for the i.MX8QM SoC. The dt-bindings docs are extended to describe the added property. * tag 'linux-can-next-for-5.12-20210106' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: can: flexcan: add CAN wakeup function for i.MX8QM dt-bindings: can: fsl,flexcan: add fsl,scu-index property to indicate a resource can: raw: return -ERANGE when filterset does not fit into user space buffer can: tcan4x5x: add support for half-duplex controllers can: tcan4x5x: rework SPI access can: tcan4x5x: add {wr,rd}_table can: tcan4x5x: add max_raw_{read,write} of 256 can: tcan4x5x: tcan4x5x_regmap: set reg_stride to 4 can: tcan4x5x: fix max register value can: tcan4x5x: tcan4x5x_regmap_init(): use spi as context pointer can: tcan4x5x: tcan4x5x_regmap_write(): remove not needed casts and replace 4 by sizeof can: tcan4x5x: rename regmap_spi_gather_write() -> tcan4x5x_regmap_gather_write() can: tcan4x5x: remove regmap async support can: tcan4x5x: tcan4x5x_bus: remove not needed read_flag_mask can: tcan4x5x: mark struct regmap_bus tcan4x5x_bus as constant can: tcan4x5x: move regmap code into seperate file can: tcan4x5x: rename tcan4x5x.c -> tcan4x5x-core.c can: tcan4x5x: beautify indention of tcan4x5x_of_match and tcan4x5x_id_table can: tcan4x5x: replace DEVICE_NAME by KBUILD_MODNAME ==================== Link: https://lore.kernel.org/r/20210107094900.173046-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 10:53:32 -08:00

1 2 3 4 5 ...

982503 Commits All Branches Search

982503 Commits

All Branches