HW error and global reset are reported through MSIX interrupts.
The same error may be reported to different functions at the
same time. When global reset begins, the pending reset request
set by this error is unnecessary. So clear the pending reset
request after the reset is complete to avoid the repeated reset.
Fixes: f6162d4412 ("net: hns3: add handling of hw errors reported through MSIX")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, netif_tx_stop_all_queues() is used to ensure that
the xmit is not running, but for the concurrent case it will
not take effect, since netif_tx_stop_all_queues() just sets
a flag without locking to indicate that the xmit queue(s)
should not be run.
So use netif_tx_disable() to replace netif_tx_stop_all_queues(),
it takes the xmit queue lock while marking the queue stopped.
Fixes: 76ad4f0ee7 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When skb->ip_summed is CHECKSUM_PARTIAL, for non-tunnel udp packet,
which has a dest port as the IANA assigned, the hardware is expected
to do the checksum offload, but the hardware whose version is below
V3 will not do the checksum offload when udp dest port is 4790.
So fixes it by doing the checksum in software for this case.
Fixes: 76ad4f0ee7 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An issue found when network interface is down and up again, FPE handshake
fails to trigger. This is due to __FPE_REMOVING bit remains being set in
stmmac_fpe_stop_wq() but not cleared in stmmac_fpe_start_wq(). This
cause FPE workqueue task, stmmac_fpe_lp_task() not able to be executed.
To fix this, add clearing __FPE_REMOVING bit in stmmac_fpe_start_wq().
Fixes: 5a5586112b ("net: stmmac: support FPE link partner hand-shaking procedure")
Signed-off-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All callers pass 0 as offset. Therefore remove the parameter and use a
fixed offset 0 in pci_vpd_find_tag().
Link: https://lore.kernel.org/r/f62e6e19-5423-2ead-b2bd-62844b23ef8f@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Variable 'err' is set to -EIO but this value is never read as it is
overwritten with a new value later on, hence it is a redundant
assignment and can be removed.
Clean up the following clang-analyzer warning:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c:1195:2: warning: Value
stored to 'err' is never read [clang-analyzer-deadcode.DeadStores]
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Variable queue is set to bp->queues but these values is not used as it
is overwritten later on, hence redundant assignment can be removed.
Cleans up the following clang-analyzer warning:
drivers/net/ethernet/cadence/macb_main.c:4919:21: warning: Value stored
to 'queue' during its initialization is never read
[clang-analyzer-deadcode.DeadStores].
drivers/net/ethernet/cadence/macb_main.c:4832:21: warning: Value stored
to 'queue' during its initialization is never read
[clang-analyzer-deadcode.DeadStores].
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some cases, the device is not initialized because reset failed.
If another task calls hns3_reset_notify_up_enet() before reset
retry, it will cause an error since uninitialized pointer access.
So add check for HNS3_NIC_STATE_INITED before calling
hns3_nic_net_open() in hns3_reset_notify_up_enet().
Fixes: bb6b94a896 ("net: hns3: Add reset interface implementation in client")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The message sent to VF should be initialized, otherwise random
value of some contents may cause improper processing by the target.
So add a initialization to message in hclge_get_link_mode().
Fixes: 9194d18b05 ("net: hns3: fix the problem that the supported port is empty")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to the UM, the type and enable status of igu_egu_hw_err
should be configured separately. Currently, the type field is
incorrect when disable this error. So fix it by configuring these
two fields separately.
Fixes: bf1faf9415 ("net: hns3: Add enable and process hw errors from IGU, EGU and NCSI")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Core:
- bpf:
- allow bpf programs calling kernel functions (initially to
reuse TCP congestion control implementations)
- enable task local storage for tracing programs - remove the
need to store per-task state in hash maps, and allow tracing
programs access to task local storage previously added for
BPF_LSM
- add bpf_for_each_map_elem() helper, allowing programs to
walk all map elements in a more robust and easier to verify
fashion
- sockmap: support UDP and cross-protocol BPF_SK_SKB_VERDICT
redirection
- lpm: add support for batched ops in LPM trie
- add BTF_KIND_FLOAT support - mostly to allow use of BTF
on s390 which has floats in its headers files
- improve BPF syscall documentation and extend the use of kdoc
parsing scripts we already employ for bpf-helpers
- libbpf, bpftool: support static linking of BPF ELF files
- improve support for encapsulation of L2 packets
- xdp: restructure redirect actions to avoid a runtime lookup,
improving performance by 4-8% in microbenchmarks
- xsk: build skb by page (aka generic zerocopy xmit) - improve
performance of software AF_XDP path by 33% for devices
which don't need headers in the linear skb part (e.g. virtio)
- nexthop: resilient next-hop groups - improve path stability
on next-hops group changes (incl. offload for mlxsw)
- ipv6: segment routing: add support for IPv4 decapsulation
- icmp: add support for RFC 8335 extended PROBE messages
- inet: use bigger hash table for IP ID generation
- tcp: deal better with delayed TX completions - make sure we don't
give up on fast TCP retransmissions only because driver is
slow in reporting that it completed transmitting the original
- tcp: reorder tcp_congestion_ops for better cache locality
- mptcp:
- add sockopt support for common TCP options
- add support for common TCP msg flags
- include multiple address ids in RM_ADDR
- add reset option support for resetting one subflow
- udp: GRO L4 improvements - improve 'forward' / 'frag_list'
co-existence with UDP tunnel GRO, allowing the first to take
place correctly even for encapsulated UDP traffic
- micro-optimize dev_gro_receive() and flow dissection, avoid
retpoline overhead on VLAN and TEB GRO
- use less memory for sysctls, add a new sysctl type, to allow using
u8 instead of "int" and "long" and shrink networking sysctls
- veth: allow GRO without XDP - this allows aggregating UDP
packets before handing them off to routing, bridge, OvS, etc.
- allow specifing ifindex when device is moved to another namespace
- netfilter:
- nft_socket: add support for cgroupsv2
- nftables: add catch-all set element - special element used
to define a default action in case normal lookup missed
- use net_generic infra in many modules to avoid allocating
per-ns memory unnecessarily
- xps: improve the xps handling to avoid potential out-of-bound
accesses and use-after-free when XPS change race with other
re-configuration under traffic
- add a config knob to turn off per-cpu netdev refcnt to catch
underflows in testing
Device APIs:
- add WWAN subsystem to organize the WWAN interfaces better and
hopefully start driving towards more unified and vendor-
-independent APIs
- ethtool:
- add interface for reading IEEE MIB stats (incl. mlx5 and
bnxt support)
- allow network drivers to dump arbitrary SFP EEPROM data,
current offset+length API was a poor fit for modern SFP
which define EEPROM in terms of pages (incl. mlx5 support)
- act_police, flow_offload: add support for packet-per-second
policing (incl. offload for nfp)
- psample: add additional metadata attributes like transit delay
for packets sampled from switch HW (and corresponding egress
and policy-based sampling in the mlxsw driver)
- dsa: improve support for sandwiched LAGs with bridge and DSA
- netfilter:
- flowtable: use direct xmit in topologies with IP
forwarding, bridging, vlans etc.
- nftables: counter hardware offload support
- Bluetooth:
- improvements for firmware download w/ Intel devices
- add support for reading AOSP vendor capabilities
- add support for virtio transport driver
- mac80211:
- allow concurrent monitor iface and ethernet rx decap
- set priority and queue mapping for injected frames
- phy: add support for Clause-45 PHY Loopback
- pci/iov: add sysfs MSI-X vector assignment interface
to distribute MSI-X resources to VFs (incl. mlx5 support)
New hardware/drivers:
- dsa: mv88e6xxx: add support for Marvell mv88e6393x -
11-port Ethernet switch with 8x 1-Gigabit Ethernet
and 3x 10-Gigabit interfaces.
- dsa: support for legacy Broadcom tags used on BCM5325, BCM5365
and BCM63xx switches
- Microchip KSZ8863 and KSZ8873; 3x 10/100Mbps Ethernet switches
- ath11k: support for QCN9074 a 802.11ax device
- Bluetooth: Broadcom BCM4330 and BMC4334
- phy: Marvell 88X2222 transceiver support
- mdio: add BCM6368 MDIO mux bus controller
- r8152: support RTL8153 and RTL8156 (USB Ethernet) chips
- mana: driver for Microsoft Azure Network Adapter (MANA)
- Actions Semi Owl Ethernet MAC
- can: driver for ETAS ES58X CAN/USB interfaces
Pure driver changes:
- add XDP support to: enetc, igc, stmmac
- add AF_XDP support to: stmmac
- virtio:
- page_to_skb() use build_skb when there's sufficient tailroom
(21% improvement for 1000B UDP frames)
- support XDP even without dedicated Tx queues - share the Tx
queues with the stack when necessary
- mlx5:
- flow rules: add support for mirroring with conntrack,
matching on ICMP, GTP, flex filters and more
- support packet sampling with flow offloads
- persist uplink representor netdev across eswitch mode
changes
- allow coexistence of CQE compression and HW time-stamping
- add ethtool extended link error state reporting
- ice, iavf: support flow filters, UDP Segmentation Offload
- dpaa2-switch:
- move the driver out of staging
- add spanning tree (STP) support
- add rx copybreak support
- add tc flower hardware offload on ingress traffic
- ionic:
- implement Rx page reuse
- support HW PTP time-stamping
- octeon: support TC hardware offloads - flower matching on ingress
and egress ratelimitting.
- stmmac:
- add RX frame steering based on VLAN priority in tc flower
- support frame preemption (FPE)
- intel: add cross time-stamping freq difference adjustment
- ocelot:
- support forwarding of MRP frames in HW
- support multiple bridges
- support PTP Sync one-step timestamping
- dsa: mv88e6xxx, dpaa2-switch: offload bridge port flags like
learning, flooding etc.
- ipa: add IPA v4.5, v4.9 and v4.11 support (Qualcomm SDX55, SM8350,
SC7280 SoCs)
- mt7601u: enable TDLS support
- mt76:
- add support for 802.3 rx frames (mt7915/mt7615)
- mt7915 flash pre-calibration support
- mt7921/mt7663 runtime power management fixes
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmCKFPIACgkQMUZtbf5S
Irtw0g/+NA8bWdHNgG4H5rya0pv2z3IieLRmSdDfKRQQXcJpklawc5MKVVaTee/Q
5/QqgPdCsu1LAU6JXBKsKmyDDaMlQKdWuKbOqDSiAQKoMesZStTEHf9d851ZzgxA
Cdb6O7BD3lBl/IN+oxNG+KcmD1LKquTPKGySq2mQtEdLO12ekAsranzmj4voKffd
q9tBShpXQ7Dq77DLYfiQXVCvsizNcbbJFuxX0o9Lpb9+61ZyYAbogZSa9ypiZZwR
I/9azRBtJg7UV1aD/cLuAfy66Qh7t63+rCxVazs5Os8jVO26P/jQdisnnOe/x+p9
wYEmKm3GSu0V4SAPxkWW+ooKusflCeqDoMIuooKt6kbP6BRj540veGw3Ww/m5YFr
7pLQkTSP/tSjuGQIdBE1LOP5LBO8DZeC8Kiop9V0fzAW9hFSZbEq25WW0bPj8QQO
zA4Z7yWlslvxcfY2BdJX3wD8klaINkl/8fDWZFFsBdfFX2VeLtm7Xfduw34BJpvU
rYT3oWr6PhtkPAKR32SUcemSfeWgIVU41eSshzRz3kez1NngBUuLlSGGSEaKbes5
pZVt6pYFFVByyf6MTHFEoQvafZfEw04JILZpo4R5V8iTHzom0kD3Py064sBiXEw2
B6t+OW4qgcxGblpFkK2lD4kR2s1TPUs0ckVO6sAy1x8q60KKKjY=
=vcbA
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- bpf:
- allow bpf programs calling kernel functions (initially to
reuse TCP congestion control implementations)
- enable task local storage for tracing programs - remove the
need to store per-task state in hash maps, and allow tracing
programs access to task local storage previously added for
BPF_LSM
- add bpf_for_each_map_elem() helper, allowing programs to walk
all map elements in a more robust and easier to verify fashion
- sockmap: support UDP and cross-protocol BPF_SK_SKB_VERDICT
redirection
- lpm: add support for batched ops in LPM trie
- add BTF_KIND_FLOAT support - mostly to allow use of BTF on
s390 which has floats in its headers files
- improve BPF syscall documentation and extend the use of kdoc
parsing scripts we already employ for bpf-helpers
- libbpf, bpftool: support static linking of BPF ELF files
- improve support for encapsulation of L2 packets
- xdp: restructure redirect actions to avoid a runtime lookup,
improving performance by 4-8% in microbenchmarks
- xsk: build skb by page (aka generic zerocopy xmit) - improve
performance of software AF_XDP path by 33% for devices which don't
need headers in the linear skb part (e.g. virtio)
- nexthop: resilient next-hop groups - improve path stability on
next-hops group changes (incl. offload for mlxsw)
- ipv6: segment routing: add support for IPv4 decapsulation
- icmp: add support for RFC 8335 extended PROBE messages
- inet: use bigger hash table for IP ID generation
- tcp: deal better with delayed TX completions - make sure we don't
give up on fast TCP retransmissions only because driver is slow in
reporting that it completed transmitting the original
- tcp: reorder tcp_congestion_ops for better cache locality
- mptcp:
- add sockopt support for common TCP options
- add support for common TCP msg flags
- include multiple address ids in RM_ADDR
- add reset option support for resetting one subflow
- udp: GRO L4 improvements - improve 'forward' / 'frag_list'
co-existence with UDP tunnel GRO, allowing the first to take place
correctly even for encapsulated UDP traffic
- micro-optimize dev_gro_receive() and flow dissection, avoid
retpoline overhead on VLAN and TEB GRO
- use less memory for sysctls, add a new sysctl type, to allow using
u8 instead of "int" and "long" and shrink networking sysctls
- veth: allow GRO without XDP - this allows aggregating UDP packets
before handing them off to routing, bridge, OvS, etc.
- allow specifing ifindex when device is moved to another namespace
- netfilter:
- nft_socket: add support for cgroupsv2
- nftables: add catch-all set element - special element used to
define a default action in case normal lookup missed
- use net_generic infra in many modules to avoid allocating
per-ns memory unnecessarily
- xps: improve the xps handling to avoid potential out-of-bound
accesses and use-after-free when XPS change race with other
re-configuration under traffic
- add a config knob to turn off per-cpu netdev refcnt to catch
underflows in testing
Device APIs:
- add WWAN subsystem to organize the WWAN interfaces better and
hopefully start driving towards more unified and vendor-
independent APIs
- ethtool:
- add interface for reading IEEE MIB stats (incl. mlx5 and bnxt
support)
- allow network drivers to dump arbitrary SFP EEPROM data,
current offset+length API was a poor fit for modern SFP which
define EEPROM in terms of pages (incl. mlx5 support)
- act_police, flow_offload: add support for packet-per-second
policing (incl. offload for nfp)
- psample: add additional metadata attributes like transit delay for
packets sampled from switch HW (and corresponding egress and
policy-based sampling in the mlxsw driver)
- dsa: improve support for sandwiched LAGs with bridge and DSA
- netfilter:
- flowtable: use direct xmit in topologies with IP forwarding,
bridging, vlans etc.
- nftables: counter hardware offload support
- Bluetooth:
- improvements for firmware download w/ Intel devices
- add support for reading AOSP vendor capabilities
- add support for virtio transport driver
- mac80211:
- allow concurrent monitor iface and ethernet rx decap
- set priority and queue mapping for injected frames
- phy: add support for Clause-45 PHY Loopback
- pci/iov: add sysfs MSI-X vector assignment interface to distribute
MSI-X resources to VFs (incl. mlx5 support)
New hardware/drivers:
- dsa: mv88e6xxx: add support for Marvell mv88e6393x - 11-port
Ethernet switch with 8x 1-Gigabit Ethernet and 3x 10-Gigabit
interfaces.
- dsa: support for legacy Broadcom tags used on BCM5325, BCM5365 and
BCM63xx switches
- Microchip KSZ8863 and KSZ8873; 3x 10/100Mbps Ethernet switches
- ath11k: support for QCN9074 a 802.11ax device
- Bluetooth: Broadcom BCM4330 and BMC4334
- phy: Marvell 88X2222 transceiver support
- mdio: add BCM6368 MDIO mux bus controller
- r8152: support RTL8153 and RTL8156 (USB Ethernet) chips
- mana: driver for Microsoft Azure Network Adapter (MANA)
- Actions Semi Owl Ethernet MAC
- can: driver for ETAS ES58X CAN/USB interfaces
Pure driver changes:
- add XDP support to: enetc, igc, stmmac
- add AF_XDP support to: stmmac
- virtio:
- page_to_skb() use build_skb when there's sufficient tailroom
(21% improvement for 1000B UDP frames)
- support XDP even without dedicated Tx queues - share the Tx
queues with the stack when necessary
- mlx5:
- flow rules: add support for mirroring with conntrack, matching
on ICMP, GTP, flex filters and more
- support packet sampling with flow offloads
- persist uplink representor netdev across eswitch mode changes
- allow coexistence of CQE compression and HW time-stamping
- add ethtool extended link error state reporting
- ice, iavf: support flow filters, UDP Segmentation Offload
- dpaa2-switch:
- move the driver out of staging
- add spanning tree (STP) support
- add rx copybreak support
- add tc flower hardware offload on ingress traffic
- ionic:
- implement Rx page reuse
- support HW PTP time-stamping
- octeon: support TC hardware offloads - flower matching on ingress
and egress ratelimitting.
- stmmac:
- add RX frame steering based on VLAN priority in tc flower
- support frame preemption (FPE)
- intel: add cross time-stamping freq difference adjustment
- ocelot:
- support forwarding of MRP frames in HW
- support multiple bridges
- support PTP Sync one-step timestamping
- dsa: mv88e6xxx, dpaa2-switch: offload bridge port flags like
learning, flooding etc.
- ipa: add IPA v4.5, v4.9 and v4.11 support (Qualcomm SDX55, SM8350,
SC7280 SoCs)
- mt7601u: enable TDLS support
- mt76:
- add support for 802.3 rx frames (mt7915/mt7615)
- mt7915 flash pre-calibration support
- mt7921/mt7663 runtime power management fixes"
* tag 'net-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2451 commits)
net: selftest: fix build issue if INET is disabled
net: netrom: nr_in: Remove redundant assignment to ns
net: tun: Remove redundant assignment to ret
net: phy: marvell: add downshift support for M88E1240
net: dsa: ksz: Make reg_mib_cnt a u8 as it never exceeds 255
net/sched: act_ct: Remove redundant ct get and check
icmp: standardize naming of RFC 8335 PROBE constants
bpf, selftests: Update array map tests for per-cpu batched ops
bpf: Add batched ops support for percpu array
bpf: Implement formatted output helpers with bstr_printf
seq_file: Add a seq_bprintf function
sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues
net:nfc:digital: Fix a double free in digital_tg_recv_dep_req
net: fix a concurrency bug in l2tp_tunnel_register()
net/smc: Remove redundant assignment to rc
mpls: Remove redundant assignment to err
llc2: Remove redundant assignment to rc
net/tls: Remove redundant initialization of record
rds: Remove redundant assignment to nr_sig
dt-bindings: net: mdio-gpio: add compatible for microchip,mdio-smi0
...
In case ethernet driver is enabled and INET is disabled, selftest will
fail to build.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: 3e1e58d64c ("net: add generic selftest support")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210428130947.29649-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Although HWTSTAMP_TX_ONESTEP_SYNC existed in ioctl for hardware timestamp
configuration, the PTP Sync one-step timestamping had never been supported.
This patch is to truely support it.
- ocelot_port_txtstamp_request()
This function handles tx timestamp request by storing
ptp_cmd(tx timestamp type) in OCELOT_SKB_CB(skb)->ptp_cmd,
and additionally for two-step timestamp storing ts_id in
OCELOT_SKB_CB(clone)->ptp_cmd.
- ocelot_ptp_rew_op()
During xmit, this function is called to get rew_op (rewriter option) by
checking skb->cb for tx timestamp request, and configure to transmitting.
Non-onestep-Sync packet with one-step timestamp request falls back to use
two-step timestamp.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert to a common ocelot_port_txtstamp_request() for TX timestamp
request handling.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Free skb->cb usage in core driver and let device drivers decide to
use or not. The reason having a DSA_SKB_CB(skb)->clone was because
dsa_skb_tx_timestamp() which may set the clone pointer was called
before p->xmit() which would use the clone if any, and the device
driver has no way to initialize the clone pointer.
This patch just put memset(skb->cb, 0, sizeof(skb->cb)) at beginning
of dsa_slave_xmit(). Some new features in the future, like one-step
timestamp may need more bytes of skb->cb to use in
dsa_skb_tx_timestamp(), and p->xmit().
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In emac_mac_tx_buf_send, it calls emac_tx_fill_tpd(..,skb,..).
If some error happens in emac_tx_fill_tpd(), the skb will be freed via
dev_kfree_skb(skb) in error branch of emac_tx_fill_tpd().
But the freed skb is still used via skb->len by netdev_sent_queue(,skb->len).
As i observed that emac_tx_fill_tpd() haven't modified the value of skb->len,
thus my patch assigns skb->len to 'len' before the possible free and
use 'len' instead of skb->len later.
Fixes: b9b17debc6 ("net: emac: emac gigabit ethernet controller driver")
Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Variable ret is set to zero but this value is never read as it is
overwritten with a new value later on, hence it is a redundant
assignment and can be removed.
Cleans up the following clang-analyzer warning:
drivers/net/ethernet/davicom/dm9000.c:1527:5: warning: Value stored to
'ret' is never read [clang-analyzer-deadcode.DeadStores].
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Variable prev_link and curr_link is being assigned a value from a
calculation however the variable is never read, so this redundant
variable can be removed.
Cleans up the following clang-analyzer warning:
drivers/net/ethernet/amd/pcnet32.c:2857:4: warning: Value stored to
'prev_link' is never read [clang-analyzer-deadcode.DeadStores].
drivers/net/ethernet/amd/pcnet32.c:2856:4: warning: Value stored to
'curr_link' is never read [clang-analyzer-deadcode.DeadStores].
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Updates for SoC specific drivers include a few subsystems that
have their own maintainers but send them through the soc tree:
TEE/OP-TEE:
- Add tracepoints around calls to secure world
Memory controller drivers:
- Minor fixes for Renesas, Exynos, Mediatek and Tegra platforms
- Add debug statistics to Tegra20 memory controller
- Update Tegra bindings and convert to dtschema
ARM SCMI Firmware:
- Support for modular SCMI protocols and vendor specific extensions
- New SCMI IIO driver
- Per-cpu DVFS
The other driver changes are all from the platform maintainers
directly and reflect the drivers that don't fit into any other
subsystem as well as treewide changes for a particular platform.
SoCFPGA:
- Various cleanups contributed by Krzysztof Kozlowski
Mediatek:
- add MT8183 support to mutex driver
- MMSYS: use per SoC array to describe the possible routing
- add MMSYS support for MT8183 and MT8167
- add support for PMIC wrapper with integrated arbiter
- add support for MT8192/MT6873
Tegra:
- Bug fixes to PMC and clock drivers
NXP/i.MX:
- Update SCU power domain driver to keep console domain power on.
- Add missing ADC1 power domain to SCU power domain driver.
- Update comments for single global power domain in SCU power domain
driver.
- Add i.MX51/i.MX53 unique id support to i.MX SoC driver.
NXP/FSL SoC driver updates for v5.13
- Add ACPI support for RCPM driver
- Use generic io{read,write} for QE drivers after performance optimized
for PowerPC
- Fix QBMAN probe to cleanup HW states correctly for kexec
- Various cleanup and style fix for QBMAN/QE/GUTS drivers
OMAP:
- Preparation to use devicetree for genpd
- ti-sysc needs iorange check improved when the interconnect target module
has no control registers listed
- ti-sysc needs to probe l4_wkup and l4_cfg interconnects first to avoid
issues with missing resources and unnecessary deferred probe
- ti-sysc debug option can now detect more devices
- ti-sysc now warns if an old incomplete devicetree data is found as we
now rely on it being complete for am3 and 4
- soc init code needs to check for prcm and prm nodes for omap4/5 and dra7
- omap-prm driver needs to enable autoidle retention support for omap4
- omap5 clocks are missing gpmc and ocmc clock registers
- pci-dra7xx now needs to use builtin_platform_driver instead of using
builtin_platform_driver_probe for deferred probe to work
Raspberry Pi:
- Fix-up all RPi firmware drivers so as for unbind to happen in an
orderly fashion
- Support for RPi's PoE hat PWM bus
Qualcomm
- Improved detection for SCM calling conventions
- Support for OEM specific wifi firmware path
- Added drivers for SC7280/SM8350: RPMH, LLCC< AOSS QMP
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmCC2JwACgkQmmx57+YA
GNkgRg//cBtq2NyDbjiNABxFSkmGCfcc0w0C2wjVzr4cfg6BLTbuvvlpZxI912pu
P1G2sbsdfQJ8sSeIyZos+PilWK0zHrqlaGZfKI19US45dMjpteDBgsPd7wNZwBjQ
jbops3YLjztZK1HpY4dIdvMnfxt7yRqhBWaTbPuCwQ35c5KsOM8NHB3cP3BUINWK
x1uuBCv9svppzwdDiPxneV93WKEzabOUo+WBMPyh5vnyvmW17Iif4BA/VKQxzymm
mWUi8HHpKBpvntJOKwAD2hnLAdpR3SwX20SLOpyLhnJMotbzNUEqq3LdRxDNPdHk
ry+rarJ78JGlYfpcfegf2bLf5ITNMfOyRGkjtzeYpcZIXPjufOg9DA9YtAy37k0u
L0T/9gQ+tQ01WGMca77OyUtIqJKdblZrQMfuH/yGlR99bqFQMV7rNc7GNlX1MXp/
zw4aOYrRWGtGEeAjx5JJWcYydvMSJpCrqxTz3YhgeJECHB2iA6YkV3NROR4TLW//
tfxaKqxR/KmSqE6hoVOAuuQ0BLXNlql/+4EE6MKsAOBiKPJclvmJg4CyuY8G21ev
9Su0zJnXMzai7gNu32v1pizGj26+AOhxCEgAG0mGgk2jlQSn24CKgm5e7kCUewcF
j/1XksNPT95v/K8MsLpXe5xGvF3jhA1BlFfvjJNZOrcZywBXRxg=
=iidq
-----END PGP SIGNATURE-----
Merge tag 'arm-drivers-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC driver updates from Arnd Bergmann:
"Updates for SoC specific drivers include a few subsystems that have
their own maintainers but send them through the soc tree:
TEE/OP-TEE:
- Add tracepoints around calls to secure world
Memory controller drivers:
- Minor fixes for Renesas, Exynos, Mediatek and Tegra platforms
- Add debug statistics to Tegra20 memory controller
- Update Tegra bindings and convert to dtschema
ARM SCMI Firmware:
- Support for modular SCMI protocols and vendor specific extensions
- New SCMI IIO driver
- Per-cpu DVFS
The other driver changes are all from the platform maintainers
directly and reflect the drivers that don't fit into any other
subsystem as well as treewide changes for a particular platform.
SoCFPGA:
- Various cleanups contributed by Krzysztof Kozlowski
Mediatek:
- add MT8183 support to mutex driver
- MMSYS: use per SoC array to describe the possible routing
- add MMSYS support for MT8183 and MT8167
- add support for PMIC wrapper with integrated arbiter
- add support for MT8192/MT6873
Tegra:
- Bug fixes to PMC and clock drivers
NXP/i.MX:
- Update SCU power domain driver to keep console domain power on.
- Add missing ADC1 power domain to SCU power domain driver.
- Update comments for single global power domain in SCU power domain
driver.
- Add i.MX51/i.MX53 unique id support to i.MX SoC driver.
NXP/FSL SoC driver updates for v5.13
- Add ACPI support for RCPM driver
- Use generic io{read,write} for QE drivers after performance
optimized for PowerPC
- Fix QBMAN probe to cleanup HW states correctly for kexec
- Various cleanup and style fix for QBMAN/QE/GUTS drivers
OMAP:
- Preparation to use devicetree for genpd
- ti-sysc needs iorange check improved when the interconnect target
module has no control registers listed
- ti-sysc needs to probe l4_wkup and l4_cfg interconnects first to
avoid issues with missing resources and unnecessary deferred probe
- ti-sysc debug option can now detect more devices
- ti-sysc now warns if an old incomplete devicetree data is found as
we now rely on it being complete for am3 and 4
- soc init code needs to check for prcm and prm nodes for omap4/5 and
dra7
- omap-prm driver needs to enable autoidle retention support for
omap4
- omap5 clocks are missing gpmc and ocmc clock registers
- pci-dra7xx now needs to use builtin_platform_driver instead of
using builtin_platform_driver_probe for deferred probe to work
Raspberry Pi:
- Fix-up all RPi firmware drivers so as for unbind to happen in an
orderly fashion
- Support for RPi's PoE hat PWM bus
Qualcomm
- Improved detection for SCM calling conventions
- Support for OEM specific wifi firmware path
- Added drivers for SC7280/SM8350: RPMH, LLCC< AOSS QMP"
* tag 'arm-drivers-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (165 commits)
soc: aspeed: fix a ternary sign expansion bug
memory: mtk-smi: Add device-link between smi-larb and smi-common
memory: samsung: exynos5422-dmc: handle clk_set_parent() failure
memory: renesas-rpc-if: fix possible NULL pointer dereference of resource
clk: socfpga: fix iomem pointer cast on 64-bit
soc: aspeed: Adapt to new LPC device tree layout
pinctrl: aspeed-g5: Adapt to new LPC device tree layout
ipmi: kcs: aspeed: Adapt to new LPC DTS layout
ARM: dts: Remove LPC BMC and Host partitions
dt-bindings: aspeed-lpc: Remove LPC partitioning
soc: fsl: enable acpi support in RCPM driver
soc: qcom: mdt_loader: Detect truncated read of segments
soc: qcom: mdt_loader: Validate that p_filesz < p_memsz
soc: qcom: pdr: Fix error return code in pdr_register_listener
firmware: qcom_scm: Fix kernel-doc function names to match
firmware: qcom_scm: Suppress sysfs bind attributes
firmware: qcom_scm: Workaround lack of "is available" call on SC7180
firmware: qcom_scm: Reduce locking section for __get_convention()
firmware: qcom_scm: Make __qcom_scm_is_call_available() return bool
Revert "soc: fsl: qe: introduce qe_io{read,write}* wrappers"
...
Core changes:
- Provide IRQF_NO_AUTOEN as a flag for request*_irq() so drivers can be
cleaned up which either use a seperate mechanism to prevent auto-enable
at request time or have a racy mechanism which disables the interrupt
right after request.
- Get rid of the last usage of irq_create_identity_mapping() and remove
the interface.
- An overhaul of tasklet_disable(). Most usage sites of tasklet_disable()
are in task context and usually in cleanup, teardown code pathes.
tasklet_disable() spinwaits for a tasklet which is currently executed.
That's not only a problem for PREEMPT_RT where this can lead to a live
lock when the disabling task preempts the softirq thread. It's also
problematic in context of virtualization when the vCPU which runs the
tasklet is scheduled out and the disabling code has to spin wait until
it's scheduled back in. Though there are a few code pathes which invoke
tasklet_disable() from non-sleepable context. For these a new disable
variant which still spinwaits is provided which allows to switch
tasklet_disable() to a sleep wait mechanism. For the atomic use cases
this does not solve the live lock issue on PREEMPT_RT. That is mitigated
by blocking on the RT specific softirq lock.
- The PREEMPT_RT specific implementation of softirq processing and
local_bh_disable/enable().
On RT enabled kernels soft interrupt processing happens always in task
context and all interrupt handlers, which are not explicitly marked to
be invoked in hard interrupt context are forced into task context as
well. This allows to protect against softirq processing with a per
CPU lock, which in turn allows to make BH disabled regions preemptible.
Most of the softirq handling code is still shared. The RT/non-RT
specific differences are addressed with a set of inline functions which
provide the context specific functionality. The local_bh_disable() /
local_bh_enable() mechanism are obviously seperate.
- The usual set of small improvements and cleanups
Driver changes:
- New drivers for Nuvoton WPCM450 and DT 79rc3243x interrupt controllers
- Extended functionality for MStar, STM32 and SC7280 irq chips
- Enhanced robustness for ARM GICv3/4.1 drivers
- The usual set of cleanups and improvements all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmCGh5wTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoZ+/EACWBpQ/2ZHizEw1bzjaDzJrR8U228xu
wNi7nSP92Y07nJ3cCX7a6TJ53mqd0n3RT+DprlsOuqSN0D7Ktr/x44V/aZtm0d3N
GkFOlpeGCRnHusLaUTwk7a8289LuoQ7OhSxIB409n1I4nLI96ZK41D1tYonMYl6E
nxDiGADASfjaciBWbjwJO/mlwmiW/VRpSTxswx0wzakFfbIx9iKyKv1bCJQZ5JK+
lHmf0jxpDIs1EVK/ElJ9Ky6TMBlEmZyiX7n6rujtwJ1W+Jc/uL/y8pLJvGwooVmI
yHTYsLMqzviCbAMhJiB3h1qs3GbCGlM78prgJTnOd0+xEUOCcopCRQlsTXVBq8Nb
OS+HNkYmYXRfiSH6lINJsIok8Xis28bAw/qWz2Ho+8wLq0TI8crK38roD1fPndee
FNJRhsPPOBkscpIldJ0Cr0X5lclkJFiAhAxORPHoseKvQSm7gBMB7H99xeGRffTn
yB3XqeTJMvPNmAHNN4Brv6ey3OjwnEWBgwcnIM2LtbIlRtlmxTYuR+82OPOgEvzk
fSrjFFJqu0LEMLEOXS4pYN824PawjV//UAy4IaG8AodmUUCSGHgw1gTVa4sIf72t
tXY54HqWfRWRpujhVRgsZETqBUtZkL6yvpoe8f6H7P91W5tAfv3oj4ch9RkhUo+Z
b0/u9T0+Fpbg+w==
=id4G
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"The usual updates from the irq departement:
Core changes:
- Provide IRQF_NO_AUTOEN as a flag for request*_irq() so drivers can
be cleaned up which either use a seperate mechanism to prevent
auto-enable at request time or have a racy mechanism which disables
the interrupt right after request.
- Get rid of the last usage of irq_create_identity_mapping() and
remove the interface.
- An overhaul of tasklet_disable().
Most usage sites of tasklet_disable() are in task context and
usually in cleanup, teardown code pathes. tasklet_disable()
spinwaits for a tasklet which is currently executed. That's not
only a problem for PREEMPT_RT where this can lead to a live lock
when the disabling task preempts the softirq thread. It's also
problematic in context of virtualization when the vCPU which runs
the tasklet is scheduled out and the disabling code has to spin
wait until it's scheduled back in.
There are a few code pathes which invoke tasklet_disable() from
non-sleepable context. For these a new disable variant which still
spinwaits is provided which allows to switch tasklet_disable() to a
sleep wait mechanism. For the atomic use cases this does not solve
the live lock issue on PREEMPT_RT. That is mitigated by blocking on
the RT specific softirq lock.
- The PREEMPT_RT specific implementation of softirq processing and
local_bh_disable/enable().
On RT enabled kernels soft interrupt processing happens always in
task context and all interrupt handlers, which are not explicitly
marked to be invoked in hard interrupt context are forced into task
context as well. This allows to protect against softirq processing
with a per CPU lock, which in turn allows to make BH disabled
regions preemptible.
Most of the softirq handling code is still shared. The RT/non-RT
specific differences are addressed with a set of inline functions
which provide the context specific functionality. The
local_bh_disable() / local_bh_enable() mechanism are obviously
seperate.
- The usual set of small improvements and cleanups
Driver changes:
- New drivers for Nuvoton WPCM450 and DT 79rc3243x interrupt
controllers
- Extended functionality for MStar, STM32 and SC7280 irq chips
- Enhanced robustness for ARM GICv3/4.1 drivers
- The usual set of cleanups and improvements all over the place"
* tag 'irq-core-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
irqchip/xilinx: Expose Kconfig option for Zynq/ZynqMP
irqchip/gic-v3: Do not enable irqs when handling spurious interrups
dt-bindings: interrupt-controller: Add IDT 79RC3243x Interrupt Controller
irqchip: Add support for IDT 79rc3243x interrupt controller
irqdomain: Drop references to recusive irqdomain setup
irqdomain: Get rid of irq_create_strict_mappings()
irqchip/jcore-aic: Kill use of irq_create_strict_mappings()
ARM: PXA: Kill use of irq_create_strict_mappings()
irqchip/gic-v4.1: Disable vSGI upon (GIC CPUIF < v4.1) detection
irqchip/tb10x: Use 'fallthrough' to eliminate a warning
genirq: Reduce irqdebug cacheline bouncing
kernel: Initialize cpumask before parsing
irqchip/wpcm450: Drop COMPILE_TEST
irqchip/irq-mst: Support polarity configuration
irqchip: Add driver for WPCM450 interrupt controller
dt-bindings: interrupt-controller: Add nuvoton, wpcm450-aic
dt-bindings: qcom,pdc: Add compatible for sc7280
irqchip/stm32: Add usart instances exti direct event support
irqchip/gic-v3: Fix OF_BAD_ADDR error handling
irqchip/sifive-plic: Mark two global variables __ro_after_init
...
For UDP encapsultions, we only support the offloaded Vxlan port and
Geneve port. All other ports included FOU and GUE are not supported so
we need to turn off TSO and checksum features.
v2: Reverse the check for supported UDP ports to be more straight forward.
Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If firmware is capable, set the IFF_SUPP_NOFCS flag to support the
sockets option to transmit packets without FCS. This is mainly used
for testing.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support VF device IDs used by the Hyper-V hypervisor.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the PF is no longer enforcing an assigned MAC address on a VF, the
VF needs to call bnxt_approve_mac() to tell the PF what MAC address it is
now using. Otherwise it gets out of sync and the PF won't know what
MAC address the VF wants to use. Ultimately the VF will fail when it
tries to setup the L2 MAC filter for the vnic.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move it before bnxt_update_vf_mac(). In the next patch, we need to call
bnxt_approve_mac() from bnxt_update_mac() under some conditions. This
will avoid forward declaration.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is perfectly legal for the stack to query and configure VFs via PF
NDOs while the NIC is administratively down. Remove the unnecessary
check for the PF to be in open state.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware previously only allowed promiscuous mode for VFs associated with
a default VLAN. It is now possible to enable promiscuous mode for a VF
having no VLAN configured provided that it is trusted. In such cases the
VF will see all packets received by the PF, irrespective of destination
MAC or VLAN.
Note, it is necessary to query firmware at the time of bnxt_promisc_ok()
instead of in bnxt_hwrm_func_qcfg() because the trusted status might be
altered by the PF after the VF has been configured. This check must now
also be deferred because the firmware call sleeps.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the current code, the driver will not shutdown the link during
IFDOWN if there are still VFs sharing the port. Newer firmware will
manage the link down decision when the port is shared by VFs, so
we can just call firmware to shutdown the port unconditionally and
let firmware make the final decision.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Copy the phy related feature flags from the firmware call
HWRM_PORT_PHY_QCAPS to this new field. We can also remove the flags
field in the bnxt_test_info structure. It's cleaner to have all PHY
related flags in one location, directly copied from the firmware.
To keep the BNXT_PHY_CFG_ABLE() macro logic the same, we need to make
a slight adjustment to check that it is a PF.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware reports link signalling mode for certain speeds. In these
cases, print the signalling modes in kernel log link up messages.
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
devlink external port attribute for SF (Sub-Function) port flavour
This adds the support to instantiate Sub-Functions on external hosts
E.g when Eswitch manager is enabled on the ARM SmarNic SoC CPU, users
are now able to spawn new Sub-Functions on the Host server CPU.
Parav Pandit Says:
==================
This series introduces and uses external attribute for the SF port to
indicate that a SF port belongs to an external controller.
This is needed to generate unique phys_port_name when PF and SF numbers
are overlapping between local and external controllers.
For example two controllers 0 and 1, both of these controller have a SF.
having PF number 0, SF number 77. Here, phys_port_name has duplicate
entry which doesn't have controller number in it.
Hence, add controller number optionally when a SF port is for an
external controller. This extension is similar to existing PF and VF
eswitch ports of the external controller.
When a SF is for external controller an example view of external SF
port and config sequence:
On eswitch system:
$ devlink dev eswitch set pci/0033:01:00.0 mode switchdev
$ devlink port show
pci/0033:01:00.0/196607: type eth netdev enP51p1s0f0np0 flavour physical port 0 splittable false
pci/0033:01:00.0/131072: type eth netdev eth0 flavour pcipf controller 1 pfnum 0 external true splittable false
function:
hw_addr 00:00:00:00:00:00
$ devlink port add pci/0033:01:00.0 flavour pcisf pfnum 0 sfnum 77 controller 1
pci/0033:01:00.0/163840: type eth netdev eth1 flavour pcisf controller 1 pfnum 0 sfnum 77 splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached
phys_port_name construction:
$ cat /sys/class/net/eth1/phys_port_name
c1pf0sf77
Patch summary:
First 3 patches prepares the eswitch to handle vports in more generic
way using xarray to lookup vport from its unique vport number.
Patch-1 returns maximum eswitch ports only when eswitch is enabled
Patch-2 prepares eswitch to return eswitch max ports from a struct
Patch-3 uses xarray for vport and representor lookup
Patch-4 considers SF for an additioanl range of SF vports
Patch-5 relies on SF hw table to check SF support
Patch-6 extends SF devlink port attribute for external flag
Patch-7 stores the per controller SF allocation attributes
Patch-8 uses SF function id for filtering events
Patch-9 uses helper for allocation and free
Patch-10 splits hw table into per controller table and generic one
Patch-11 extends sf table for additional range
==================
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmCDz80ACgkQSD+KveBX
+j4LmggAwS9otYoo639Kmow/wlMZ6yyLsH02zVMFLEJ2AE4VbL73i4iiQ67ZWygL
yQ8HawPAnythx4RsN/M6/WjSKRpdqTC27C9CpdM78zhXb1vnOrlzba7rYngqmo7N
5fIkGyjsUGHNqq+15SftK7JYbXFTe1b5RdWawXkQoyBlXTTBamyxD7C5NMpoDots
/e88Bs8Zy5nVPZqPchIId8TZEKKuO/heTz8ks6q6s/t1MGj7QP+ddxVMgNg00NR5
OpNTr7YYdpHxpfLSUZgdHaptwwKOx+nou8LdJkIKWPs7SHX6HDggyZJjGBOEWtE7
qG7oSip4olOTM0w9PZrAewLwSYhq7Q==
=oqBr
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2021-04-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2021-04-21
devlink external port attribute for SF (Sub-Function) port flavour
This adds the support to instantiate Sub-Functions on external hosts
E.g when Eswitch manager is enabled on the ARM SmarNic SoC CPU, users
are now able to spawn new Sub-Functions on the Host server CPU.
Parav Pandit Says:
==================
This series introduces and uses external attribute for the SF port to
indicate that a SF port belongs to an external controller.
This is needed to generate unique phys_port_name when PF and SF numbers
are overlapping between local and external controllers.
For example two controllers 0 and 1, both of these controller have a SF.
having PF number 0, SF number 77. Here, phys_port_name has duplicate
entry which doesn't have controller number in it.
Hence, add controller number optionally when a SF port is for an
external controller. This extension is similar to existing PF and VF
eswitch ports of the external controller.
When a SF is for external controller an example view of external SF
port and config sequence:
On eswitch system:
$ devlink dev eswitch set pci/0033:01:00.0 mode switchdev
$ devlink port show
pci/0033:01:00.0/196607: type eth netdev enP51p1s0f0np0 flavour physical port 0 splittable false
pci/0033:01:00.0/131072: type eth netdev eth0 flavour pcipf controller 1 pfnum 0 external true splittable false
function:
hw_addr 00:00:00:00:00:00
$ devlink port add pci/0033:01:00.0 flavour pcisf pfnum 0 sfnum 77 controller 1
pci/0033:01:00.0/163840: type eth netdev eth1 flavour pcisf controller 1 pfnum 0 sfnum 77 splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached
phys_port_name construction:
$ cat /sys/class/net/eth1/phys_port_name
c1pf0sf77
Patch summary:
First 3 patches prepares the eswitch to handle vports in more generic
way using xarray to lookup vport from its unique vport number.
Patch-1 returns maximum eswitch ports only when eswitch is enabled
Patch-2 prepares eswitch to return eswitch max ports from a struct
Patch-3 uses xarray for vport and representor lookup
Patch-4 considers SF for an additioanl range of SF vports
Patch-5 relies on SF hw table to check SF support
Patch-6 extends SF devlink port attribute for external flag
Patch-7 stores the per controller SF allocation attributes
Patch-8 uses SF function id for filtering events
Patch-9 uses helper for allocation and free
Patch-10 splits hw table into per controller table and generic one
Patch-11 extends sf table for additional range
==================
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds device tree probing to the IXP4xx ethernet
driver.
Add a platform data bool to tell us whether to
register an MDIO bus for the device or not, as well
as the corresponding NPE.
We need to drop the memory region request as part of
this since the OF core will request the memory for the
device.
Cc: Zoltan HERPAI <wigyori@uid0.hu>
Cc: Raylynn Knight <rayknight@me.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This driver was using a really dated way of obtaining the
phy by printing a string and using it with phy_connect().
Switch to using more reasonable modern interfaces.
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In bnxt_rx_pkt(), the RX buffers are expected to complete in order.
If the RX consumer index indicates an out of order buffer completion,
it means we are hitting a hardware bug and the driver will abort all
remaining RX packets and reset the RX ring. The RX consumer index
that we pass to bnxt_discard_rx() is not correct. We should be
passing the current index (tmp_raw_cons) instead of the old index
(raw_cons). This bug can cause us to be at the wrong index when
trying to abort the next RX packet. It can crash like this:
#0 [ffff9bbcdf5c39a8] machine_kexec at ffffffff9b05e007
#1 [ffff9bbcdf5c3a00] __crash_kexec at ffffffff9b111232
#2 [ffff9bbcdf5c3ad0] panic at ffffffff9b07d61e
#3 [ffff9bbcdf5c3b50] oops_end at ffffffff9b030978
#4 [ffff9bbcdf5c3b78] no_context at ffffffff9b06aaf0
#5 [ffff9bbcdf5c3bd8] __bad_area_nosemaphore at ffffffff9b06ae2e
#6 [ffff9bbcdf5c3c28] bad_area_nosemaphore at ffffffff9b06af24
#7 [ffff9bbcdf5c3c38] __do_page_fault at ffffffff9b06b67e
#8 [ffff9bbcdf5c3cb0] do_page_fault at ffffffff9b06bb12
#9 [ffff9bbcdf5c3ce0] page_fault at ffffffff9bc015c5
[exception RIP: bnxt_rx_pkt+237]
RIP: ffffffffc0259cdd RSP: ffff9bbcdf5c3d98 RFLAGS: 00010213
RAX: 000000005dd8097f RBX: ffff9ba4cb11b7e0 RCX: ffffa923cf6e9000
RDX: 0000000000000fff RSI: 0000000000000627 RDI: 0000000000001000
RBP: ffff9bbcdf5c3e60 R8: 0000000000420003 R9: 000000000000020d
R10: ffffa923cf6ec138 R11: ffff9bbcdf5c3e83 R12: ffff9ba4d6f928c0
R13: ffff9ba4cac28080 R14: ffff9ba4cb11b7f0 R15: ffff9ba4d5a30000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
Fixes: a1b0e4e684 ("bnxt_en: Improve RX consumer index validity check.")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Variable result is being assigned a value from a calculation
however the variable is never read, so this redundant variable
can be removed.
Cleans up the following clang-analyzer warning:
drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c:1488:2:
warning: Value stored to 'pos' is never read
[clang-analyzer-deadcode.DeadStores].
drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c:876:3:
warning: Value stored to 'pos' is never read
[clang-analyzer-deadcode.DeadStores].
drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c:36:3:
warning: Value stored to 'start' is never read
[clang-analyzer-deadcode.DeadStores].
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extended the SF table to cover additioanl SF id range of external
controller.
A user optionallly provides the external controller number when user
wants to create SF on the external controller.
An example on eswitch system:
$ devlink dev eswitch set pci/0033:01:00.0 mode switchdev
$ devlink port show
pci/0033:01:00.0/196607: type eth netdev enP51p1s0f0np0 flavour physical port 0 splittable false
pci/0033:01:00.0/131072: type eth netdev eth0 flavour pcipf controller 1 pfnum 0 external true splittable false
function:
hw_addr 00:00:00:00:00:00
$ devlink port add pci/0033:01:00.0 flavour pcisf pfnum 0 sfnum 77 controller 1
pci/0033:01:00.0/163840: type eth netdev eth1 flavour pcisf controller 1 pfnum 0 sfnum 77 external true splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Device has SF ids in two different contiguous ranges. One for the local
controller and second for the external controller's PF.
Each such range has its own maximum number of functions and base id.
To allocate SF from either of the range, prepare code to split into
range specific fields into its own structure.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Use helper routines for SF id and SF table allocation and free
so that subsequent patch can reuse it for multiple SF function
id range.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Vhca events on eswitch manager are received for all the functions on the
NIC, including for SFs of external host PF controllers.
While SF device handler is only interested in SF devices events related
to its own PF.
Hence, validate if the function belongs to self or not.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
SF ids in the device are in two different contiguous ranges. One for
the local controller and second for the external host controller.
Prepare code to handle multiple start function id by storing it in the
table.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Extended SF port attributes to have optional external flag similar to
PCI PF and VF port attributes.
External atttibute is required to generate unique phys_port_name when PF number
and SF number are overlapping between two controllers similar to SR-IOV
VFs.
When a SF is for external controller an example view of external SF
port and config sequence.
On eswitch system:
$ devlink dev eswitch set pci/0033:01:00.0 mode switchdev
$ devlink port show
pci/0033:01:00.0/196607: type eth netdev enP51p1s0f0np0 flavour physical port 0 splittable false
pci/0033:01:00.0/131072: type eth netdev eth0 flavour pcipf controller 1 pfnum 0 external true splittable false
function:
hw_addr 00:00:00:00:00:00
$ devlink port add pci/0033:01:00.0 flavour pcisf pfnum 0 sfnum 77 controller 1
pci/0033:01:00.0/163840: type eth netdev eth1 flavour pcisf controller 1 pfnum 0 sfnum 77 splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached
phys_port_name construction:
$ cat /sys/class/net/eth1/phys_port_name
c1pf0sf77
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Supporting SF allocation is currently checked at two places:
(a) SF devlink port allocator and
(b) SF HW table handler.
Both layers are using HCA CAP to identify it using helper routine
mlx5_sf_supported() and mlx5_sf_max_functions().
Instead, rely on the HW table handler to check if SF is supported
or not.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Query SF vports count and base id of host PF from the firmware.
Account these ports in the total port calculation whenever it is non
zero.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently vport number to vport and its representor are mapped using an
array and an index.
Vport numbers of different types of functions are not contiguous. Adding
new such discontiguous range using index and number mapping is increasingly
complex and hard to maintain.
Hence, maintain an xarray of vport and rep whose lookup is done based on
the vport number.
Each VF and SF entry is marked with a xarray mark to identify the function
type. Additionally PF and VF needs special handling for legacy inline
mode. They are additionally marked as host function using additional
HOST_FN mark.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Total vports are already stored during eswitch initialization. Instead
of calculating everytime, read directly from eswitch.
Additionally, host PF's SF vport information is available using
QUERY_HCA_CAP command. It is not available through HCA_CAP of the
eswitch manager PF.
Hence, this patch prepares the return total eswitch vport count from the
existing eswitch struct.
This further helps to keep eswitch port counting macros and logic within
eswitch.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5_eswitch_get_total_vports() doesn't honor MLX5_ESWICH Kconfig flag.
When MLX5_ESWITCH is disabled, FS layer continues to initialize eswitch
specific ACL namespaces.
Instead, start honoring MLX5_ESWITCH flag and perform vport specific
initialization only when vport count is non zero.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Tony Nguyen says:
====================
40GbE Intel Wired LAN Driver Updates 2021-04-23
This series contains updates to i40e and iavf drivers.
Aleksandr adds support for VIRTCHNL_VF_CAP_ADV_LINK_SPEED in i40e which
allows for reporting link speed to VF as a value instead of using an
enum; helper functions are created to remove repeated code.
Coiby Xu reduces memory use of i40e when using kdump by reducing Tx, Rx,
and admin queue to minimum values. Current use causes failure of kdump.
Stefan Assmann removes duplicated free calls in iavf.
Haiyue cleans up a loop to return directly when if the value is found
and changes some magic numbers to defines for better maintainability
in iavf.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The previous patch to support PTP Sync packet one-step timestamping
described one-step timestamping packet handling logic as below in
commit message:
- Trasmit packet immediately if no other one in transfer, or queue to
skb queue if there is already one in transfer.
The test_and_set_bit_lock() is used here to lock and check state.
- Start a work when complete transfer on hardware, to release the bit
lock and to send one skb in skb queue if has.
There was not problem of the description, but there was a mistake in
implementation. The locking/test_and_set_bit_lock() should be put in
enetc_start_xmit() which may be called by worker, rather than in
enetc_xmit(). Otherwise, the worker calling enetc_start_xmit() after
bit lock released is not able to lock again for transfer.
Fixes: 7294380c52 ("enetc: support PTP Sync packet one-step timestamping")
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace a tight busy-wait loop without a pause with a standard
readx_poll_timeout_atomic routine with a 5 us poll period.
Tested by booting a MT7621 device to ensure the driver initializes
properly.
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This improves GRO performance
Signed-off-by: Felix Fietkau <nbd@nbd.name>
[Ilya: Use MTK_RXD4_FOE_ENTRY instead of GENMASK(13, 0)]
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use napi_complete_done to communicate total TX and RX work done to NAPI.
Count total RX work up instead of remaining work down for clarity.
Remove unneeded local variables for clarity. Use do {} while instead of
goto for clarity.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid rearming interrupt if napi_complete returns false
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Uncached memory access is expensive, and there is no need to access all
descriptor words if we can't process them anyway
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The value is only updated by the CPU, so it is cheaper to access from the
ring data structure than from a hardware register.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduces the number of interrupts under load
Signed-off-by: Felix Fietkau <nbd@nbd.name>
[Ilya: add documentation for new struct fields]
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
256 descriptors is not enough for multi-gigabit traffic under load on
MT7622. Bump it to 512 to improve performance.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Improves tx performance
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When running short on descriptors, only stop the queue for the netdev that
tx was attempted for. By the time something tries to send on the other
netdev, the ring might have some more room already.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
usleep_range often ends up sleeping much longer than the 10-20us provided
as a range here. This causes significant latency in mdio bus acceses,
which easily adds multiple seconds to the boot time on MT7621 when polling
DSA slave ports.
Use cond_resched instead of usleep_range, since the MDIO access does not
take much time
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Should improve performance, since it can use bulk free
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case build_skb fails, call skb_free_frag on the correct pointer. Also
update the DMA structures with the new mapping before exiting, because
the mapping was successful
Suggested-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since build_skb accesses the data area (for initializing shinfo), dma unmap
needs to happen before that call
Signed-off-by: Felix Fietkau <nbd@nbd.name>
[Ilya: split build_skb cleanup fix into a separate commit]
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The VLAN ID in the rx descriptor is only valid if the RX_DMA_VTAG bit is
set. Fixes frames wrongly marked with VLAN tags.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
[Ilya: fix commit message]
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mana_gd_poll_cq() may return -1 if an overflow error is detected (this
should never happen unless there is a bug in the driver or the hardware).
Fix the type of the variable "comp_read" by using int rather than u32.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: ca9c54d2d6 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the FDIR entry is found, just return the result directly to break
the loop.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The maximum number (2) of flex-byte support is derived from ethtool
use-def data size (8 byte).
Change the magic number 2 to macro definition, and add the comment to
track the design thinking, so the code is clear and easily maintained.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Both iavf_free_all_tx_resources() and iavf_free_all_rx_resources() have
already been called in the very same function.
Remove the duplicate calls.
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The minimum size of admin send/receive queue is 1 and 2 respectively.
The admin send queue can't be set to 1 because in that case, the
firmware would fail to init.
Signed-off-by: Coiby Xu <coxu@redhat.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Use the minimum of the number of descriptors thus we will allocate the
minimal ring buffers for kdump.
Signed-off-by: Coiby Xu <coxu@redhat.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Set the number of the MSI-X vectors to 1. When MSI-X is enabled,
it's not allowed to use more TC queue pairs than MSI-X vectors
(pf->num_lan_msix) exist. Thus the number of Tx and Rx pairs
(vsi->num_queue_pairs) will be equal to the number of MSI-X vectors,
i.e., 1.
Signed-off-by: Coiby Xu <coxu@redhat.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Refactor repeated link state reporting code into a separate helper
functions: i40e_set_vf_link_state() i40e_vc_link_speed2mbps().
Add support of VIRTCHNL_VF_CAP_ADV_LINK_SPEED;
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
DWMAC Core 5.20 onwards supports HW descriptor prefetching.
Additionally, it also depends on platform specific RTL configuration.
This capability could be enabled by setting DMA_Mode bit-19 (DCHE).
So, to enable this cability, platform must set plat->dma_cfg->dche = true
and the DWMAC core version must be 5.20 onwards. Else, this capability
wouldn`t be configured
Signed-off-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Handling comm_channel_event in mlx4_master_comm_channel uses a double
loop to determine which slaves have requested work. The search is
always started at lowest slave. This leads to unfairness; lower VFs
tends to be prioritized over higher VFs.
The patch uses find_next_bit to determine which slaves to handle.
Fairness is implemented by always starting at the next to the last
start.
An MPI program has been used to measure improvements. It runs 500
ibv_reg_mr, synchronizes with all other instances and then runs 500
ibv_dereg_mr.
The results running 500 processes, time reported is for running 500
calls:
ibv_reg_mr:
Mod. Org.
mlx4_1 403.356ms 424.674ms
mlx4_2 403.355ms 424.674ms
mlx4_3 403.354ms 424.674ms
mlx4_4 403.355ms 424.674ms
mlx4_5 403.357ms 424.677ms
mlx4_6 403.354ms 424.676ms
mlx4_7 403.357ms 424.675ms
mlx4_8 403.355ms 424.675ms
ibv_dereg_mr:
Mod. Org.
mlx4_1 116.408ms 142.818ms
mlx4_2 116.434ms 142.793ms
mlx4_3 116.488ms 143.247ms
mlx4_4 116.679ms 143.230ms
mlx4_5 112.017ms 107.204ms
mlx4_6 112.032ms 107.516ms
mlx4_7 112.083ms 184.195ms
mlx4_8 115.089ms 190.618ms
Suggested-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The problem is that bnxt_show_temp() returns long but "rc" is an int
and "len" is a u32. With ternary operations the type promotion is quite
tricky. The negative "rc" is first promoted to u32 and then to long so
it ends up being a high positive value instead of a a negative as we
intended.
Fix this by removing the ternary.
Fixes: d69753fa1e ("bnxt_en: return proper error codes in bnxt_show_temp")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2021-04-22
This series contains updates to virtchnl header file, ice, and iavf
drivers.
Vignesh adds support to warn about potentially malicious VFs; those that
are overflowing the mailbox for the ice driver.
Michal adds support for an allowlist/denylist of VF commands based on
supported capabilities for the ice driver.
Brett adds support for iavf UDP segmentation offload by adding the
capability bit to virtchnl, advertising support in the ice driver, and
enabling it in the iavf driver. He also adds a helper function for
getting the VF VSI for ice.
Colin Ian King removes an unneeded pointer assignment.
Qi enables support in the ice driver to support virtchnl requests from
the iavf to configure its own RSS input set. This includes adding new
capability bits, structures, and commands to virtchnl header file.
Haiyue enables configuring RSS flow hash via ethtool to support TCP, UDP
and SCTP protocols in iavf.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
There are a few warnings about empty debug macros in this driver:
drivers/net/ethernet/neterion/vxge/vxge-main.c: In function 'vxge_probe':
drivers/net/ethernet/neterion/vxge/vxge-main.c:4480:76: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
4480 | "Failed in enabling SRIOV mode: %d\n", ret);
Change them to proper 'do { } while (0)' expressions to make the
code a little more robust and avoid the warnings.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
A link time bug that I had fixed before has come back now that
another sub-module was added to the enetc driver:
ERROR: modpost: "enetc_ierb_register_pf" [drivers/net/ethernet/freescale/enetc/fsl-enetc.ko] undefined!
The problem is that the enetc Makefile is not actually used for
the ierb module if that is the only built-in driver in there
and everything else is a loadable module.
Fix it by always entering the directory this time, regardless
of which symbols are configured. This should reliably fix the
problem and prevent it from coming back another time.
Fixes: 112463ddbe ("net: dsa: felix: fix link error")
Fixes: e7d48e5fbf ("net: enetc: add a mini driver for the Integrated Endpoint Register Block")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MANA driver causes a build failure in some configurations when
it selects an unavailable symbol:
WARNING: unmet direct dependencies detected for PCI_HYPERV
Depends on [n]: PCI [=y] && X86_64 [=y] && HYPERV [=n] && PCI_MSI [=y] && PCI_MSI_IRQ_DOMAIN [=y] && SYSFS [=y]
Selected by [y]:
- MICROSOFT_MANA [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_MICROSOFT [=y] && PCI_MSI [=y] && X86_64 [=y]
drivers/pci/controller/pci-hyperv.c: In function 'hv_irq_unmask':
drivers/pci/controller/pci-hyperv.c:1217:9: error: implicit declaration of function 'hv_set_msi_entry_from_desc' [-Werror=implicit-function-declaration]
1217 | hv_set_msi_entry_from_desc(¶ms->int_entry.msi_entry, msi_desc);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
A PCI driver should never depend on a particular host bridge
implementation in the first place, but if we have this dependency
it's better to express it as a 'depends on' rather than 'select'.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provide the ability to enable SCTP RSS hashing by ethtool.
It gives users option of generating RSS hash based on the SCTP source
and destination ports numbers, IPv4 or IPv6 source and destination
addresses.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Provides the ability to enable UDP RSS hashing by ethtool.
It gives users option of generating RSS hash based on the UDP source
and destination ports numbers, IPv4 or IPv6 source and destination
addresses.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Provides the ability to enable TCP RSS hashing by ethtool.
It gives users option of generating RSS hash based on the TCP source
and destination ports numbers, IPv4 or IPv6 source and destination
addresses.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add the virtchnl message interface to VF, so that VF can request RSS
input set(s) based on PF's capability.
This framework allows ethtool RSS config support on the VF driver.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently, RSS hash input is not available to AVF by ethtool, it is set
by the PF directly.
Add the RSS configure support for AVF through new virtchnl message, and
define the capability flag VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF to query this
new RSS offload support.
Signed-off-by: Jia Guo <jia.guo@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Bo Chen <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently, the driver gets the VF's VSI by using a long string of
dereferences (i.e. vf->pf->vsi[vf->lan_vsi_idx]). If the method to get
the VF's VSI were to change the driver would have to change it in every
location. Fix this by adding the helper ice_get_vf_vsi().
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Pointer vsi is being re-assigned a value that is never read,
the assignment is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add code to support UDP segmentation offload (USO) for
hardware that supports it.
Suggested-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
As the hardware is capable of supporting UDP segmentation offload, add a
capability bit to virtchnl.h to communicate this and have the driver
advertise its support.
Suggested-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Declare bitmap of allowed commands on VF. Initialize default
opcodes list that should be always supported. Declare array of
supported opcodes for each caps used in virtchnl code.
Change allowed bitmap by setting or clearing corresponding
bit to allowlist (bit set) or denylist (bit clear).
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Attempt to detect malicious VFs and, if suspected, log the information but
keep going to allow the user to take any desired actions.
Potentially malicious VFs are identified by checking if the VFs are
transmitting too many messages via the PF-VF mailbox which could cause an
overflow of this channel resulting in denial of service. This is done by
creating a snapshot or static capture of the mailbox buffer which can be
traversed and in which the messages sent by VFs are tracked.
Co-developed-by: Yashaswini Raghuram Prathivadi Bhayankaram <yashaswini.raghuram.prathivadi.bhayankaram@intel.com>
Signed-off-by: Yashaswini Raghuram Prathivadi Bhayankaram <yashaswini.raghuram.prathivadi.bhayankaram@intel.com>
Co-developed-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Co-developed-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Vignesh Sridhar <vignesh.sridhar@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The call to clk_disable_unprepare() can happen before priv is
initialized. This means moving clk_disable_unprepare out of
out_release into a new label.
Fixes: 8ef7adc6be ("net: ethernet: ravb: Enable optional refclk")
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a lot of frames were received in the short term, the driver
caused a stuck of receiving until a new frame was received. For example,
the following command from other device could cause this issue.
$ sudo ping -f -l 1000 -c 1000 <this driver's ipaddress>
The previous code always cleared the interrupt flag of RX but checks
the interrupt flags in ravb_poll(). So, ravb_poll() could not call
ravb_rx() in the next time until a new RX frame was received if
ravb_rx() returned true. To fix the issue, always calls ravb_rx()
regardless the interrupt flags condition.
Fixes: c156633f13 ("Renesas Ethernet AVB driver proper")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TSO and TBS cannot co-exist and current implementation requires two
fixes:
1) stmmac_open() does not need to call stmmac_enable_tbs() because
the MAC is reset in stmmac_init_dma_engine() anyway.
2) Inside stmmac_hw_setup(), we should call stmmac_enable_tso() for
TX Q that is _not_ configured for TBS.
Fixes: 579a25a854 ("net: stmmac: Initial support for TBS")
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TSO and TBS cannot coexist, for now we set Intel mGbE controller to use
below TX Queue mapping: TxQ0 uses TSO and the rest of TXQs supports TBS.
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We recently added some new locking to this function but one error path
was overlooked. We need to drop the lock before returning.
Fixes: f4da56529d ("net: stmmac: Add support for external trigger timestamping")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The number of lanes of devlink port should be correctly initialized
when registering the port, so that the input check when running
"devlink port split <port> count <N>" can pass.
Fixes: a21cf0a833 ("devlink: Add a new devlink port lanes attribute and pass to netlink")
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds a new mailbox to get CPT stats, includes performance
counters, CPT engines status and RXC status.
Signed-off-by: Narayana Prasad Raju Atherya <pathreya@marvell.com>
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CN10K CPT coprocessor includes a component named RXC which
is responsible for reassembly of inner IP packets. RXC has
the feature to evict oldest entries based on age/threshold.
This patch adds a new mailbox to configure reassembly age
or threshold.
Signed-off-by: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds changes to existing CPT mailbox messages to support
CN10K CPT block. This patch also adds new register defines
for CN10K CPT.
Signed-off-by: Vidya Sagar Velumuri <vvelumuri@marvell.com>
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bit-masks used for the TXERRCH and RXERRCH (tx and rx error channels)
are incorrect and always lead to a zero result. The mask values are
currently the incorrect post-right shifted values, fix this by setting
them to the currect values.
(I double checked these against the TMS320TCI6482 data sheet, section
5.30, page 127 to ensure I had the correct mask values for the TXERRCH
and RXERRCH fields in the MACSTATUS register).
Addresses-Coverity: ("Operands don't affect result")
Fixes: a6286ee630 ("net: Add TI DaVinci EMAC driver")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable err is being initialized with a value that is
never read and it is being updated later with a new value. The
initialization is redundant and can be removed
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We're starting from a TXQ label, not a TXQ type, so
efx_channel_get_tx_queue() is inappropriate. This worked by chance,
because labels and types currently match on EF10, but we shouldn't
rely on that.
Fixes: 12804793b1 ("sfc: decouple TXQ type from label")
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We're starting from a TXQ label, not a TXQ type, so
efx_channel_get_tx_queue() is inappropriate (and could return NULL,
leading to panics).
Fixes: 12804793b1 ("sfc: decouple TXQ type from label")
Cc: stable@vger.kernel.org
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We're starting from a TXQ instance number ('qid'), not a TXQ type, so
efx_get_tx_queue() is inappropriate (and could return NULL, leading
to panics).
Fixes: 12804793b1 ("sfc: decouple TXQ type from label")
Reported-by: Trevor Hemsley <themsley@voiceflex.com>
Cc: stable@vger.kernel.org
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that enetc supports flow control we have to make sure the settings in
the IERB are correct. Therefore, we actually depend on the enetc-ierb
module. Previously it was possible that this module was disabled while the
enetc was enabled. Fix it by automatically select the enetc-ierb module.
Fixes: e7d48e5fbf ("net: enetc: add a mini driver for the Integrated Endpoint Register Block")
Signed-off-by: Michael Walle <michael@walle.cc>
Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw used to hold an array of qdiscs indexed by the TC number. In the
previous patch, it was changed to allocate child qdiscs dynamically, and
they are now indexed by band number. Follow suit with the array of future
FIFOs.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of keeping qdiscs in globally-preallocated arrays, introduce a
per-qdisc-kind value num_classes, and then allocate the necessary child
qdiscs (if any) based on that value. Since now dynamic allocation is
involved, mlxsw_sp_qdisc_replace() gets messy enough that it is worth it to
split it to two cases: a new qdisc allocation and a change of existing
qdisc. (Note that the change also includes what TC formally calls replace,
if the qdisc kind is the same.)
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The FIFO handler currently guards accesses to the future FIFO tracking by
asserting RTNL. In the future, the changes to the qdisc state will be more
thorough, so other qdiscs will need this guarding is as well. In order
to not further the RTNL infestation, instead convert to a custom lock that
will guard accesses to the qdisc state.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw currently allows a two-level structure of qdiscs: the root and
possibly a number of children. In order to support offloading more general
qdisc trees, introduce to struct mlxsw_sp_qdisc a pointer to child qdiscs.
Refer to the child qdiscs through this pointer, instead of going through
the tclass_qdiscs in qdisc_state. Additionally introduce a field
num_classes, which holds number of given qdisc's children.
Also introduce a generic function for walking qdisc trees. Rewrite
mlxsw_sp_qdisc_find() and _find_by_handle() to use the generic walker.
For now, keep the qdisc_state.tclass_qdisc, and just point root_qdiscs's
children to this array. Following patches will make the allocation dynamic.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a qdisc is removed, it is necessary to update the backlog value at its
parent--unless the qdisc is at root position. RED, TBF and FIFO all do
that, each separately. Since all of them need to do this, just promote the
operation directly to mlxsw_sp_qdisc_destroy(), instead of deferring it to
individual destructors. Since FIFO dtor thus becomes trivial, remove it.
Add struct mlxsw_sp_qdisc.parent to point at the parent qdisc. This will be
handy later as deeper structures are offloaded. Use the parent qdisc to
find the chain of parents whose backlog value needs to be updated.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tclass_num is just a number, a value that would be ordinarily passed around
as an int. (Which is unlike a u8 prio_bitmap.) In several places,
tclass_num already is an int. Convert the remaining instances.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function mlxsw_sp_qdisc_compare() is invoked a couple lines above this
check, which will bounce any requests where this condition does not hold.
Therefore drop it.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The purpose of this function is to filter out events that are related to
qdiscs that are not offloaded, or are not offloaded anymore. But the
function is unnecessarily thorough:
- mlxsw_sp_qdisc pointer is never NULL in the context where it is called
- Two qdiscs with the same handle will never have different types. Even
when replacing one qdisc with another in the same class, Linux will not
permit handle reuse unless the qdisc type also matches.
Simplify the function by omitting these two unnecessary conditions.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mlxsw_sp_qdisc argument is not used in any of the actual callbacks.
Drop it.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patchset provides some updates to mlx5e and mlx5 SW steering drivers:
1) Tariq and Vladyslav they both provide some trivial update to mlx5e netdev.
The next 12 patches in the patchset are focused toward mlx5 SW steering:
2) 3 trivial cleanup patches
3) Dynamic Flex parser support:
Flex parser is a HW parser that can support protocols that are not
natively supported by the HCA, such as Geneve (TLV options) and GTP-U.
There are 8 such parsers, and each of them can be assigned to parse a
specific set of protocols.
4) Enable matching on Geneve TLV options
5) Use Flex parser for MPLS over UDP/GRE
6) Enable matching on tunnel GTP-U and GTP-U first extension
header using
7) Improved QoS for SW steering internal QPair for a better insertion rate
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmB+R90ACgkQSD+KveBX
+j5UKAf+OODHHlYUxp3k4uSRGPNHuVZsUw4DjoCY6b0E4uMup9bP0YF7/B1I8bpC
xTbVK9SzYTVOt0pxBu3aJ1Qom5hpJt5iT7QG9m5LlhEn/ZD3KqpnenGuDMIlyOa5
EvLIdeOoWxJ+7Za6pULy4hsbUcu8hupsBBN+poC3dN4akQu1NyvFE4mdHVTP/c7n
DB0mZWskoDyXm1dQiZ4+cDWoltrrpFLo5n7N08QbS+AvJ7jsRrT5myBU4IPMEfP6
peRecTpKZEOwBwTzxi41ao5XZYnKTROD3zax30v6DXxw5K41SQKwvCjgvpc2/V1J
jymdJzdYa17mxu5XMC0aaQoFo2VBDg==
=wwzX
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2021-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2021-04-19
This patchset provides some updates to mlx5e and mlx5 SW steering drivers:
1) Tariq and Vladyslav they both provide some trivial update to mlx5e netdev.
The next 12 patches in the patchset are focused toward mlx5 SW steering:
2) 3 trivial cleanup patches
3) Dynamic Flex parser support:
Flex parser is a HW parser that can support protocols that are not
natively supported by the HCA, such as Geneve (TLV options) and GTP-U.
There are 8 such parsers, and each of them can be assigned to parse a
specific set of protocols.
4) Enable matching on Geneve TLV options
5) Use Flex parser for MPLS over UDP/GRE
6) Enable matching on tunnel GTP-U and GTP-U first extension
header using
7) Improved QoS for SW steering internal QPair for a better insertion rate
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
With this patch the ag71xx on Atheros AR9331 will able to run generic net
selftests.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this patch FEC on iMX will able to run generic net selftests
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when GID is deleted, it zero out all the fields of the RoCE
address in the SET_ROCE_ADDRESS command for a specified index.
roce_version = 0 means RoCEv1 in the SET_ROCE_ADDRESS command.
This assumes that device has RoCEv1 always enabled which is not always
correct. For example Subfunction does not support RoCEv1.
Due to this assumption a previously added RoCEv2 GID is always deleted as
RoCEv1 GID. This results in a below syndrome:
mlx5_core.sf mlx5_core.sf.4: mlx5_cmd_check:777:(pid 4256): SET_ROCE_ADDRESS(0x761) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x12822d)
Hence set the right RoCE version during GID deletion provided by the core.
Link: https://lore.kernel.org/r/d3f54129c90ca329caf438dbe31875d8ad08d91a.1618753425.git.leonro@nvidia.com
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
When using SW steering, rule insertion rate depends on the RDMA RC QP
performance used for writing to the ICM. During stress this QP is competing
on the HW resources with all the other QPs that are used to send data.
To protect SW steering QP's performance in such cases, we set this QP to
use isolated VL. The VL number is reserved by FW and is not exposed to the
driver.
Support for this QP on isolated VL exists only when both force-loopback and
isolate_vl_tc capabilities are set.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When supported by the device, SW steering RoCE RC QP that is used to
write/read to/from ICM will be created with force-loopback attribute.
Such QP doesn't require GID index upon creation.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Query the flex_parser id that's intended for TNL_MPLS
and use an appropriate flex parser for MPLS over UDP/GRE.
Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Set the flex parser ID dynamicly for ICMP instead of relying
on hardcoded values.
Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Flex parser is a HW parser that can support protocols that are not
natively supported by the HCA, such as Geneve (TLV options) and GTP-U.
There are 8 such parsers, and each of them can be assigned to parse a
specific set of protocols.
This patch adds misc4 match params which allows using a correct flex parser
that was programmed to the required protocol.
Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Remove MPLS specific fields from flex parser 3 layout.
Flex parser can be used for multiple protocols and should
not be hardcoded to a specific type.
Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add error code to the error messages and removed duplicated message:
if termination table creation failed, we already get an error message
in mlx5_eswitch_termtbl_create, so no need for the additional error print
in the calling function.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Rename the argument to better reflect that the meaning is
not number of records, but wheather or not we should
ring the dorbell.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Striding RQ attributes below are mutually dependent. An unaware
change to one might take the others out of the valid range derived
by the HW caps:
- The MPWQE size in bytes
- The number of strides in a MPWQE
- The stride size
Add checks to verify they are valid and comply to the HW spec
and SW assumptions/requirements.
This is not a fix, no particular issue exists today.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
If mlx5e_devlink_port_register() fails, driver may try to register
devlink health TX and RX reporters on non-registered devlink port.
Instead, create health reporters only if mlx5e_devlink_port_register()
does not fail. And destroy reporters only if devlink_port is registered.
Also, change mlx5e_get_devlink_port() behavior and return NULL in case
port is not registered to replicate devlink's wrapper when ndo is not
implemented.
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The changes done in commit [1] were missed by the code movements
done in [2], as they were developed in ~parallel.
Here we re-apply them.
[1] commit e4484d9df5 ("net/mlx5e: Enable striding RQ for Connect-X IPsec capable devices")
[2] commit b3a131c2a1 ("net/mlx5e: Move params logic into its dedicated file")
Fixes: b3a131c2a1 ("net/mlx5e: Move params logic into its dedicated file")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Michael suggest a few more stats we can expose.
$ ethtool -S eth0 --groups eth-mac
Standard stats for eth0:
eth-mac-FramesTransmittedOK: 902623288966
eth-mac-FramesReceivedOK: 28727667047
eth-mac-FrameCheckSequenceErrors: 1
eth-mac-AlignmentErrors: 0
eth-mac-OutOfRangeLengthField: 0
$ ethtool -S eth0 | grep '\(fcs\|align\|oor\)'
rx_fcs_err_frames: 1
rx_align_err_frames: 0
tx_fcs_err_frames: 0
Suggested-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add PCI match for AC3X 98DX3265 device which is supported by the current
driver and firmware.
Signed-off-by: Vadym Kochan <vkochan@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move structs/defines for ethernet/dma register into driver, since they
are only used for this driver and remove any MIPS specific includes.
This makes it possible to COMPILE_TEST the driver.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
With device tree clock is provided via CCF. For non device tree
use a maximum clock value to not overclock the PHY. The non device
tree usage will go away after platform is converted to DT.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
If there is no mac address passed via platform data try to get it via
device tree and fall back to a random mac address, if all fail.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of access to struct korina_device by just passing the mac
address via platform data and use drvdata for passing netdev to remove
function.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of messing with MIPS specific macros use DMA API for mapping
descriptors and skbs.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove helpers, which are only used in one call site.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Descriptors are mapped uncached so there is no need to do any cache
handling for them.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simplify probe/remove code by using devm_ functions.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixed MDIO functions to work reliable and not just by accident.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not hit EOPNOTSUPP when flowtable offload provides a VLAN pop action.
Fixes: efce49dfe6 ("netfilter: flowtable: add vlan pop action offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch 2ed37183ab ("netfilter: flowtable: separate replace, destroy and
stats to different workqueues") splits the workqueue per event type. Add
a mutex to serialize updates.
Fixes: 502e84e238 ("net: ethernet: mtk_eth_soc: add flow offloading support")
Reported-by: Frank Wunderlich <frank-w@public-files.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The former fix only papered over the actual problem: the
ethernet core expects the netdev .dev member to have the
proper DMA masks set, or there will be BUG_ON() triggered
in kernel/dma/mapping.c.
Fix this by simply copying dma_mask and dma_mask_coherent
from the parent device.
Fixes: e45d0fad4a ("net: ethernet: ixp4xx: Use parent dev for DMA pool")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Issue was traffic problems after a while with increased ping times if
flow offload is active. It turns out that key_offset with cookie is
needed in rhashtable_params but was re-assigned to head_offset.
Fix the assignment.
Fixes: 502e84e238 ("net: ethernet: mtk_eth_soc: add flow offloading support")
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Tested-by: Frank Wunderlich <frank-w@public-files.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SEPARATOR_VALUE macro is used as separator when getting
the register value, but the value of this macro is different
between pf and vf, it is a bit confusing for the user, so
synchronize the value of vf with pf.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Modify some inappropriate spaces in comments of struct
hlcgevf_tqp_stats.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When enter suspend mode the counter of pf reset will be increased
twice, since both hclge_prepare_general() and hclge_prepare_wait()
increase this counter. So remove the duplicate counting in
hclge_prepare_general().
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kernel test robot reports build errors in 3 Xilinx ethernet drivers.
They all use ioremap functions that are only available when HAS_IOMEM
is set/enabled. If it is not enabled, they all have build errors,
so make these 3 drivers depend on HAS_IOMEM.
ld: drivers/net/ethernet/xilinx/xilinx_emaclite.o: in function `xemaclite_of_probe':
xilinx_emaclite.c:(.text+0x9fc): undefined reference to `devm_ioremap_resource'
ld: drivers/net/ethernet/xilinx/xilinx_axienet_main.o: in function `axienet_probe':
xilinx_axienet_main.c:(.text+0x942): undefined reference to `devm_ioremap_resource'
ld: drivers/net/ethernet/xilinx/ll_temac_main.o: in function `temac_probe':
ll_temac_main.c:(.text+0x1283): undefined reference to `devm_platform_ioremap_resource_byname'
ld: ll_temac_main.c:(.text+0x13ad): undefined reference to `devm_of_iomap'
ld: ll_temac_main.c:(.text+0x162e): undefined reference to `devm_platform_ioremap_resource'
Fixes: 8a3b7a252d ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: Zhang Changzhong <zhangchangzhong@huawei.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: stable@vger.kernel.org
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
In the ENETC receive path, a frame received by the MAC is first stored
in a 256KB 'FIFO' memory, then transferred to DRAM when enqueuing it to
the RX ring. The FIFO is a shared resource for all ENETC ports, but
every port keeps track of its own memory utilization, on RX and on TX.
There is a setting for RX rings through which they can either operate in
'lossy' mode (where the lack of a free buffer causes an immediate
discard of the frame) or in 'lossless' mode (where the lack of a free
buffer in the ring makes the frame stay longer in the FIFO).
In turn, when the memory utilization of the FIFO exceeds a certain
margin, the MAC can be configured to emit PAUSE frames.
There is enough FIFO memory to buffer up to 3 MTU-sized frames per RX
port while not jeopardizing the other use cases (jumbo frames), and
also not consume bytes from the port TX allocations. Also, 3 MTU-sized
frames worth of memory is enough to ensure zero loss for 64 byte packets
at 1G line rate.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NXP ENETC is a 4-port Ethernet controller which 'smells' to
operating systems like 4 distinct PCIe PFs with SR-IOV, each PF having
its own driver instance, but in fact there are some hardware resources
which are shared between all ports, like for example the 256 KB SRAM
FIFO between the MACs and the Host Transfer Agent which DMAs frames to
DRAM.
To hide the stuff that cannot be neatly exposed per port, the hardware
designers came up with this idea of having a dedicated register block
which is supposed to be populated by the bootloader, and contains
everything configuration-related: MAC addresses, FIFO partitioning, etc.
When a port is reset using PCIe Function Level Reset, its defaults are
transferred from the IERB configuration. Most of the time, the settings
made through the IERB are read-only in the port's memory space (if they
are even visible), so they cannot be modified at runtime.
Linux doesn't have any advanced FIFO partitioning requirements at all,
but when reading through the hardware manual, it became clear that, even
though there are many good 'recommendations' for default values, many of
them were not actually put in practice on LS1028A. So we end up with a
default configuration that:
(a) does not have enough TX and RX byte credits to support the max MTU
of 9600 (which the Linux driver claims already) properly (at full speed)
(b) allows the FIFO to be overrun with RX traffic, potentially
overwriting internal data structures.
The last part sounds a bit catastrophic, but it isn't. Frames are
supposed to transit the FIFO for a very short time, but they can
actually accumulate there under 2 conditions:
(a) there is very severe congestion on DRAM memory, or
(b) the RX rings visible to the operating system were configured for
lossless operation, and they just ran out of free buffers to copy
the frame to. This is what is used to put backpressure onto the MAC
with flow control.
So since ENETC has not supported flow control thus far, RX FIFO overruns
were never seen with Linux. But with the addition of flow control, we
should configure some registers to prevent this from happening. What we
are trying to protect against are bad actors which continue to send us
traffic despite the fact that we have signaled a PAUSE condition. Of
course we can't be lossless in that case, but it is best to configure
the FIFO to do tail dropping rather than letting it overrun.
So in a nutshell, this driver is a fixup for all the IERB default values
that should have been but aren't.
The IERB configuration needs to be done _before_ the PFs are enabled.
So every PF searches for the presence of the "fsl,ls1028a-enetc-ierb"
node in the device tree, and if it finds it, it "registers" with the
IERB, which means that it requests the IERB to fix up its default
values. This is done through -EPROBE_DEFER. The IERB driver is part of
the fsl_enetc module, but is technically a platform driver, since the
IERB is a good old fashioned MMIO region, as opposed to ENETC ports
which pretend to be PCIe devices.
The driver was already configuring ENETC_PTXMBAR (FIFO allocation for
TX) because due to an omission, TXMBAR is a read/write register in the
PF memory space. But the manual is quite clear that the formula for this
should depend upon the TX byte credits (TXBCR). In turn, the TX byte
credits are only readable/writable through the IERB. So if we want to
ensure that the TXBCR register also has a value that is correct and in
line with TXMBAR, there is simply no way this can be done from the PF
driver, access to the IERB is needed.
I could have modified U-Boot to fix up the IERB values, but that is
quite undesirable, as old U-Boot versions are likely to be floating
around for quite some time from now.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Even though ENETC interfaces are exposed as individual PCIe PFs with
their own driver instances, the ENETC is still fundamentally a
multi-port Ethernet controller, and some parts of the IP take a port
number (as can be seen in the PSFP implementation).
Create a common helper that can be used outside of the TSN code for
retrieving the ENETC port number based on the PF number. This is only
correct for LS1028A, the only Linux-capable instantiation of ENETC thus
far.
Note that ENETC port 3 is PF 6. The TSN code did not care about this
because ENETC port 3 does not support TSN, so the wrong mapping done by
enetc_get_port for PF 6 could have never been hit.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a VF driver for Microsoft Azure Network Adapter (MANA) that will be
available in the future.
Co-developed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Co-developed-by: Shachar Raindel <shacharr@microsoft.com>
Signed-off-by: Shachar Raindel <shacharr@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert indirect probe call to its direct equivalent to create a symbol
link between RDMA and netdev modules. This will give us an ability to
remove custom module reference counting that doesn't belong to the driver.
Link: https://lore.kernel.org/r/20210401065715.565226-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-By: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently, if the user changes the pause settings, the default settings
will be restored after an interface down/up cycle, and also when
resuming from suspend. This doesn't seem to provide the best user
experience. Change this to keep user settings, and just ensure that in
jumbo mode pause is disabled.
Small drawback: When switching back mtu from jumbo to non-jumbo then
pause remains disabled (but user can enable it using ethtool).
I think that's a not too common scenario and acceptable.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
- keep the ZC code, drop the code related to reinit
net/bridge/netfilter/ebtables.c
- fix build after move to net_generic
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Described in fd5736bf9f ("enetc: Workaround for MDIO register access
issue") is a workaround for a hardware bug that requires a register
access of the MDIO controller to never happen concurrently with a
register access of a port PF. To avoid that, a mutual exclusion scheme
with rwlocks was implemented - the port PF accessors are the 'read'
side, and the MDIO accessors are the 'write' side.
When we do XDP_REDIRECT between two ENETC interfaces, all is fine
because the MDIO lock is already taken from the NAPI poll loop.
But when the ingress interface is not ENETC, just the egress is, the
MDIO lock is not taken, so we might access the port PF registers
concurrently with MDIO, which will make the link flap due to wrong
values returned from the PHY.
To avoid this, let's just slap an enetc_lock_mdio/enetc_unlock_mdio at
the beginning and ending of enetc_xdp_xmit. The fact that the MDIO lock
is designed as a rwlock is important here, because the read side is
reentrant (that is one of the main reasons why we chose it). Usually,
the way we benefit of its reentrancy is by running the data path
concurrently on both CPUs, but in this case, we benefit from the
reentrancy by taking the lock even when the lock is already taken
(and that's the situation where ENETC is both the ingress and the egress
interface for XDP_REDIRECT, which was fine before and still is fine now).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the TX ring is congested, enetc_xdp_tx() returns false for the
current XDP frame (represented as an array of software BDs).
This array of software TX BDs is constructed in enetc_rx_swbd_to_xdp_tx_swbd
from software BDs freshly cleaned from the RX ring. The issue is that we
scrub the RX software BDs too soon, more precisely before we know that
we can enqueue the TX BDs successfully into the TX ring.
If we can't enqueue them (and enetc_xdp_tx returns false), we call
enetc_xdp_drop which attempts to recycle the buffers held by the RX
software BDs. But because we scrubbed those RX BDs already, two things
happen:
(a) we leak their memory
(b) we populate the RX software BD ring with an all-zero rx_swbd
structure, which makes the buffer refill path allocate more memory.
enetc_refill_rx_ring
-> if (unlikely(!rx_swbd->page))
-> enetc_new_page
That is a recipe for fast OOM.
Fixes: 7ed2bc8007 ("net: enetc: add support for XDP_TX")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the XDP program returns an invalid action, we should free the RX
buffer.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is possible for one CPU to perform TX hashing (see netdev_pick_tx)
between the 8 ENETC TX rings, and the TX hashing to select TX queue 1.
At the same time, it is possible for the other CPU to already use TX
ring 1 for XDP (either XDP_TX or XDP_REDIRECT). Since there is no mutual
exclusion between XDP and the network stack, we run into an issue
because the ENETC TX procedure is not reentrant.
The obvious approach would be to just make XDP take the lock of the
network stack's TX queue corresponding to the ring it's about to enqueue
in.
For XDP_REDIRECT, this is quite straightforward, a lock at the beginning
and end of enetc_xdp_xmit() should do the trick.
But for XDP_TX, it's a bit more complicated. For one, we do TX batching
all by ourselves for frames with the XDP_TX verdict. This is something
we would like to keep the way it is, for performance reasons. But
batching means that the network stack's lock should be kept from the
first enqueued XDP_TX frame and until we ring the doorbell. That is
mostly fine, except for cases when in the same NAPI loop we have mixed
XDP_TX and XDP_REDIRECT frames. So if enetc_xdp_xmit() gets called while
we are holding the lock from the RX NAPI, then bam, deadlock. The naive
answer could be 'just flush the XDP_TX frames first, then release the
network stack's TX queue lock, then call xdp_do_flush_map()'. But even
xdp_do_redirect() is capable of flushing the batched XDP_REDIRECT
frames, so unless we unlock/relock the TX queue around xdp_do_redirect(),
there simply isn't any clean way to protect XDP_TX from concurrent
network stack .ndo_start_xmit() on another CPU.
So we need to take a different approach, and that is to reserve two
rings for the sole use of XDP. We leave TX rings
0..ndev->real_num_tx_queues-1 to be handled by the network stack, and we
pick them from the end of the priv->tx_ring array.
We make an effort to keep the mapping done by enetc_alloc_msix() which
decides which CPU handles the TX completions of which TX ring in its
NAPI poll. So the XDP TX ring of CPU 0 is handled by TX ring 6, and the
XDP TX ring of CPU 1 is handled by TX ring 7.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that commit d6a2829e82 ("net: enetc: increase RX ring default
size") has increased the RX ring size, it is quite easy to congest the
TX rings when the traffic is predominantly XDP_TX, as the RX ring is
quite a bit larger than the TX one.
Since we bit the bullet and did the expensive thing already (larger RX
rings consume more memory pages), it seems quite foolish to keep the TX
rings small. So make them equally sized with TX.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
xdp_do_redirect already contains:
-> dev_map_enqueue
-> __xdp_enqueue
-> bq_enqueue
-> bq_xmit_all // if we have more than 16 frames
So the logic from enetc will never be hit, because ENETC_DEFAULT_TX_WORK
is 128.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the code path below fails:
enetc_clean_rx_ring_xdp // XDP_PASS
-> enetc_build_skb
-> enetc_map_rx_buff_to_skb
-> build_skb
enetc_clean_rx_ring_xdp will 'break', but that 'break' instruction isn't
strong enough to actually break the NAPI poll loop, just the switch/case
statement for XDP actions. So we increment rx_frm_cnt and go to the next
frames minding our own business.
Instead let's do what the skb NAPI poll function does, and break the
loop now, waiting for the memory pressure to go away. Otherwise the next
calls to build_skb() are likely to fail too.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When receiving a frame with errors, currently we do nothing with it (we
don't construct an skb or an xdp_buff), we just exit the NAPI poll loop.
Let's put the buffer back into the RX ring (similar to XDP_DROP).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
enetc_put_xdp_buff has nothing to do with XDP, frankly, it is just a
helper to populate the recycle end of the shadow RX BD ring
(next_to_alloc) with a given buffer.
On the other hand, enetc_put_rx_buff plays more tricks than its name
would suggest.
So let's rename enetc_put_rx_buff into enetc_flip_rx_buff to reflect the
half-page buffer reuse tricks that it employs, and enetc_put_xdp_buff
into enetc_put_rx_buff which suggests a more garden-variety operation.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Later in enetc_clean_tx_ring we have:
/* Scrub the swbd here so we don't have to do that
* when we reuse it during xmit
*/
memset(tx_swbd, 0, sizeof(*tx_swbd));
So these assignments are unnecessary.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
1GbE Intel Wired LAN Driver Updates 2021-04-16
This series contains updates to igb and igc drivers.
Ederson adjusts Tx buffer distributions in Qav mode to improve
TSN-aware traffic for igb. He also enable PPS support and auxiliary PHC
functions for igc.
Grzegorz checks that the MTA register was properly written and
retries if not for igb.
Sasha adds reporting of EEE low power idle counters to ethtool and fixes
a return value being overwritten through looping for igc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Most of the names seem to strongly correlate with names from
the standard and RFC. Whether ..+good_frames are indeed Frames..OK
I'm the least sure of.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw has nicely grouped stats, add support for standard uAPI.
I'm guessing the register access part. Compile tested only.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patchset introduces updates to mlx5e netdev driver.
1) Tariq refactors TLS offloads and adds resiliency against RX resync
failures
2) Maxim reduces code duplications by unifying channels reset flow
regardless if channels are closed or open
3) Aya Enhances TX/RX health reporters diagnostics to expose the
internal clock time-stamping format
4) Moshe adds support for ethtool extended link state, to show the reason
for link down
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmB53AUACgkQSD+KveBX
+j6rzAf+JwJG9G7GSj3a/xird4dlgt4xPbRLB19pTw19ZyHZyujDxdN4QM3r5hTk
5ua1PnhYYaUcyPFvdgR9J0cIJ3QRaxZ+q/XnkE9Yo0eZ1DJ0SL/n6rxEQpcxpee1
XP7qjJu3leVwh5mVW2uOx/ClrL9vYb/fG3Q00j59rUB+i9bZszXZgZ99hJvYBFTB
k7W/9X6BNxuLlEg/Ui9L499aDWHRcIY5J2ku+1v/8paJZltk+IFv5glYszylE++M
l68drIy3dIjl/Sxj6WR2rHTBus6AIFxWFH8C2L7uqGl97BPjS80snMPIefLJhW+y
bQvzMDtfKDmIpvEIdzHPuEhEdqqteg==
=YCy6
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2021-04-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2021-04-16
This patchset introduces updates to mlx5e netdev driver.
1) Tariq refactors TLS offloads and adds resiliency against RX resync
failures
2) Maxim reduces code duplications by unifying channels reset flow
regardless if channels are closed or open
3) Aya Enhances TX/RX health reporters diagnostics to expose the
internal clock time-stamping format
4) Moshe adds support for ethtool extended link state, to show the reason
for link down
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Gianfar used to enable all 8 Rx queues (DMA rings) per
ethernet device, even though the controller can only
support 2 interrupt lines at most. This meant that
multiple Rx queues would have to be grouped per NAPI poll
routine, and the CPU would have to split the budget and
service them in a round robin manner. The overhead of
this scheme proved to outweight the potential benefits.
The alternative was to introduce the "Single Queue" polling
mode, supporting one Rx queue per NAPI, which became the
default packet processing option and helped improve the
performance of the driver.
MQ_POLLING also relies on undocumeted device tree properties
to specify how to map the 8 Rx and Tx queues to a given
interrupt line (aka "interrupt group"). Using module parameters
to enable this mode wasn't an option either. Long story short,
MQ_POLLING became obsolete, now it is just dead code, and no
one asked for it so far.
For the Tx queues, multi-queue support (more than 1 Tx queue
per CPU) could be revisited by adding tc MQPRIO support, but
again, one has to consider that there are only 2 interrupt lines.
So the NAPI poll routine would have to service multiple Tx rings.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add parser entries for different IPv4 IHL values.
Each entry will set the L4 header offset according to the IPv4 IHL field.
L3 header offset will set during the parsing of the IPv4 protocol.
Because of missed parser support for IP header length > 20, RX IPv4 checksum HW offload fails
and skb->ip_summed set to CHECKSUM_NONE(checksum done by Network stack).
This patch adds RX IPv4 checksum HW offload capability for frames with IP header length > 20.
v1 --> v2
- Improve commit message.
Suggested-by: Dana Vardi <danat@marvell.com>
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The intention is for the loop to timeout if the body does not succeed.
The current logic calls time_is_before_jiffies(timeout) which is false
until after the timeout, so the loop body never executes.
Fix by using readl_poll_timeout as a more standard and less error-prone
solution.
Fixes: ba37b7caf1 ("net: ethernet: mtk_eth_soc: add support for initializing the PPE")
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Cc: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tx queue cleanup happens in interrupt handler on same core as rx queue
processing. Both can take considerable amount of processing in high
packet-per-second scenarios.
Sending big amounts of packets can stall the rx processing which is
unfair and also can lead to out-of-memory condition since
__dev_kfree_skb_irq queues the skbs for later kfree in softirq which
is not allowed to happen with heavy load in interrupt handler.
This puts tx cleanup in its own napi and enables threaded napi to
allow the rx/tx queue processing to happen on different cores.
The ability to sustain equal amounts of tx/rx traffic increased:
from 280Kpps to 1130Kpps on Threadripper 3960X with upcoming
Mikrotik 10/25G NIC,
from 520Kpps to 850Kpps on Intel i3-3320 with Mikrotik RB44Ge adapter.
Signed-off-by: Gatis Peisenieks <gatis@mikrotik.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As explained in bugfix commit 6ab4c3117a ("net: bridge: don't notify
switchdev for local FDB addresses") as well as in this discussion:
https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/
the switchdev notifiers for FDB entries managed to have a zero-day bug,
which was that drivers would not know what to do with local FDB entries,
because they were not told that they are local. The bug fix was to
simply not notify them of those addresses.
Let us now add the 'is_local' bit to bridge FDB entries, and make all
drivers ignore these entries by their own choice.
Co-developed-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Expose EEE Tx and Rx low power idle counters via ethtool
A EEE TX or RX LPI event occurs when the transmitter or the receiver
enters EEE (IEEE802.3az) LPI state.
ethtool --statistics <iface>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The i225 device offers a number of special PTP Hardware Clock features on
the Software Defined Pins (SDPs) - much like i210, which is used as
inspiration for this patch. It enables two possible functions, namely
time stamping external events and periodic output signals.
The assignment of PHC functions to the four SDP can be freely chosen by
the user.
For the external events time stamping, when the SDP (configured as input
by user) level changes, an interrupt is generated and the kernel
Precision Time Protocol (PTP) is informed.
For the periodic output signals, the i225 is configured to generate them
(so the SDP level will change periodically) and the driver also has to
keep updating the time of the next level change. However, this work is
not necessary for some frequencies as the i225 takes care of them
(namely, anything with a half-cycle of 500ms, 250ms, 125ms or < 70ms).
While i225 allows up to four timers to be used to source the time used
on the external events or output signals, this patch uses only one of
those timers. Main reason is to keep it simple, as it's not clear how
these extra timers would be exposed to users. Note that currently a NIC
can expose a single PTP device.
Signed-off-by: Ederson de Souza <ederson.desouza@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The i225 device can produce one interrupt on the full second, much
like i210 - from where this patch is inspired.
This patch sets up the full second interruption on the i225 and when
receiving it, it sends a PPS event to PTP (Precision Time Protocol)
kernel subsystem.
The PTP subsystem exposes the PPS events via ioctl and sysfs, and one
can use the `testptp` tool (tools/testing/selftests/ptp) to check that
the events are being generated.
Signed-off-by: Ederson de Souza <ederson.desouza@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add new function which checks MTA_REGISTER if its filled correctly.
If not then writes again to same register.
There is possibility that i210 and i211 could not accept
MTA_REGISTER settings, specially when you add and remove
many of multicast addresses in short time.
Without this patch there is possibility that multicast settings will be
not always set correctly in hardware.
Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add ts_format to 'Common Config' section of the TX/RX devlink reporters
diagnostics info. Possible values for ts_format: 'RT' or 'FRC'
which stands for: Real Time and Free Running Counters correspondingly.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Wrap 1PPS initialization in a helper for a cleaner init flow.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In case the interface was set up but cannot establish the link, ethtool
will print more information to help the user troubleshoot the state.
For example, no link due to missing cable:
$ ethtool eth1
...
Link detected: no (No cable)
Beside the general extended state, drivers can pass additional
information about the link state using the sub-state field. For example:
$ ethtool eth1
...
Link detected: no (Autoneg, No partner detected)
The extended state is available only for specific cases, in other cases
ethtool with print only "Link detected: no" as before
Signed-off-by: Moshe Tal <moshet@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The bulk size is larger than 16K so use kvzalloc().
The bulk bitmask upper size limit is 16K so use kvcalloc().
Signed-off-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5e_safe_switch_channels accepts new_chs as a parameter and opens new
channels in place, then copying them to priv->channels. It requires all
the callers to allocate space for this temporary storage of the new
channels.
This commit cleans up the API by replacing new_chs with new_params, a
meaningful subset of new_chs to be filled by the caller. The temporary
space for the new channels is allocated inside mlx5e_safe_switch_params
(a new name for mlx5e_safe_switch_channels). An extra copy of params is
made, but since it's control flow, it's not critical.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This commit extends mlx5e_safe_switch_channels() to support on-the-fly
configuration changes, when the channels are open, but don't need to be
recreated. Such flows exist when a parameter being changed doesn't
affect how the queues are created, or when the queues can be modified
while remaining active.
Before this commit, such flows were handled as special cases on the
caller site. This commit adds this functionality to
mlx5e_safe_switch_channels(), allowing the caller to pass a boolean
indicating whether it's required to recreate the channels or it's
allowed to skip it. The logic of switching channel parameters is now
completely encapsulated into mlx5e_safe_switch_channels().
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This commit uses new functionality of mlx5e_safe_switch_channels
introduced by the previous commit to reduce the amount of repeating
similar code all over the driver.
It's very common in mlx5e to call mlx5e_safe_switch_channels when the
channels are open, but assign parameters and run hardware commands
manually when the channels are closed.
After the previous commit it's no longer needed to do such manual things
every time, so this commit removes unneeded code and relies on the new
functionality of mlx5e_safe_switch_channels. Some of the places are
refactored and simplified, where more complex flows are used to change
configuration on the fly, without recreating the channels (the logic is
rewritten in a more robust way, with a reset required by default and a
list of exceptions).
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5e_safe_switch_channels is used to modify channel parameters and/or
hardware configuration in a safe way, so that if anything goes wrong,
everything reverts to the old configuration and remains in a consistent
state.
However, this function only works when the channels are open. When the
caller needs to modify some parameters, first it has to check that the
channels are open, otherwise it has to assign parameters directly, and
such boilerplate repeats in many different places.
This commit prepares for the refactoring of such places by allowing
mlx5e_safe_switch_channels to work when the channels are closed. In this
case it will assign the new parameters and run the preactivate hook.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>