Take over maintainership of the nfp driver from Simon as he
is moving away from Corigine.
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds MQPRIO Qdisc offload in full 'channel' mode which allows
not only setting up pri:tc mapping, but also configuring TX shapers on
external port FIFOs. The K3 CPSW MQPRIO Qdisc offload is expected to work
with VLAN/priority tagged packets. Non-tagged packets have to be mapped
only to TC0.
- TX traffic classes must be rated starting from TC that has highest
priority and with no gaps
- Traffic classes are used starting from 0, that has highest priority
- min_rate defines Committed Information Rate (guaranteed)
- max_rate defines Excess Information Rate (non guaranteed) and offloaded
as (max_rate[i] - tcX_min_rate[i])
- VLAN/priority tagged packets mapped to TC0 will exit switch with VLAN tag
priority 0
The configuration example:
ethtool -L eth1 tx 5
ethtool --set-priv-flags eth1 p0-rx-ptype-rrobin off
tc qdisc add dev eth1 parent root handle 100: mqprio num_tc 3 \
map 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 1@1 1@2 hw 1 mode channel \
shaper bw_rlimit min_rate 0 100mbit 200mbit max_rate 0 101mbit 202mbit
tc qdisc replace dev eth2 handle 100: parent root mqprio num_tc 1 \
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 queues 1@0 hw 1
ip link add link eth1 name eth1.100 type vlan id 100
ip link set eth1.100 type vlan egress 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
In the above example two ports share the same TX CPPI queue 0 for low
priority traffic. 3 traffic classes are defined for eth1 and mapped to:
TC0 - low priority, TX CPPI queue 0 -> ext Port 1 fifo0, no rate limit
TC1 - prio 2, TX CPPI queue 1 -> ext Port 1 fifo1, CIR=100Mbit/s, EIR=1Mbit/s
TC2 - prio 3, TX CPPI queue 2 -> ext Port 1 fifo2, CIR=200Mbit/s, EIR=2Mbit/s
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmTb9twNHGZ3QHN0cmxl
bi5kZQAKCRBwkajZrV/2ANr7D/wN/XTDG3JxLl4VUtmSOBrD6y6QwpqfUYBD11Ev
eAXpP69wxh2J9gPqtVsPAJwbc0F2eca38ziyJ9+4hmNWNBc3Hh1oXj/9e0IqdPUP
9AEHu73jLmeb6bN0RU8guxmipwZq/a4Q6y/OYPhf+c0uULYEWdH6AAHs3WGRCWHI
a0gedUU3ChKDueObHWfaZSqGuMVKS+eCfT57oKc/l2J2b2064JAksKMDglsDmsA/
VQw8Ko+l1PO7t8mOswPufGYyg5tKUXpq8AJ3Dlg2l2Qzws29FEIfzLLbu89GHlvP
FYSjgOuVdwVra/Kt7jQzUxeGrBXZC8MuaGnEOi/tMXidw0uc+N1y2Bg2N6eFkmxW
AN5e4p0S3ddfbGaEVoDx5aS6kKTDCQAvgeaM+KIExmbMJQ77FEqnq/WqyDd/eHJL
5Su1nPoIGzswSzcYC6eh6AEnesx01OdKCZtMfF1LxutopMLItsBe3UxPRglJfvvX
XxuosrZe1aIOgCSQGTeP6DqpTVEOsvNThhxapKt1AeW+YfnzGwZD0hU6vOq9ZSHS
K+wRWfhipS5nt8zTv8SbM2DgOmD0pPcsiueAXNQUW7gUUXleLOvP8tden91M/37C
eGMuZqdWRAjJtU0q4QvhiLSvlI7Awh2dsr0Vgh4k6b1oyuk3UrwKJeoZW07g9MR4
VLRIUg==
=5ePi
-----END PGP SIGNATURE-----
Merge tag 'nf-23-08-16' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florisn Westphal says:
====================
These are netfilter fixes for the *net* tree.
First patch resolves a false-positive lockdep splat:
rcu_dereference is used outside of rcu read lock. Let lockdep
validate that the transaction mutex is locked.
Second patch fixes a kdoc warning added in previous PR.
Third patch fixes a memory leak:
The catchall element isn't disabled correctly, this allows
userspace to deactivate the element again. This results in refcount
underflow which in turn prevents memory release. This was always
broken since the feature was added in 5.13.
Patch 4 fixes an incorrect change in the previous pull request:
Adding a duplicate key to a set should work if the duplicate key
has expired, restore this behaviour. All from myself.
Patch #5 resolves an old historic artifact in sctp conntrack:
a 300ms timeout for shutdown_ack. Increase this to 3s. From Xin Long.
Patch #6 fixes a sysctl data race in ipvs, two threads can clobber the
sysctl value, from Sishuai Gong. This is a day-0 bug that predates git
history.
Patches 7, 8 and 9, from Pablo Neira Ayuso, are also followups
for the previous GC rework in nf_tables: The netlink notifier and the
netns exit path must both increment the gc worker seqcount, else worker
may encounter stale (free'd) pointers.
================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
inet: socket lock and data-races avoidance
In this series, I converted 20 bits in "struct inet_sock" and made
them truly atomic.
This allows to implement many IP_ socket options in a lockless
fashion (no need to acquire socket lock), and fixes data-races
that were showing up in various KCSAN reports.
I also took care of IP_TTL/IP_MINTTL, but left few other options
for another series.
v4: Rebased after recent mptcp changes.
Added Reviewed-by: tags from Simon (thanks !)
v3: fixed patch 7, feedback from build bot about ipvs set_mcast_loop()
v2: addressed a feedback from a build bot in patch 9 by removing
unused issk variable in mptcp_setsockopt_sol_ip_set_transparent()
Added Acked-by: tags from Soheil (thanks !)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
inet->min_ttl is already read with READ_ONCE().
Implementing IP_MINTTL socket option set/read
without holding the socket lock is easy.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_select_ttl() is racy, because it reads inet->uc_ttl
without proper locking.
Add READ_ONCE()/WRITE_ONCE() annotations while
allowing IP_TTL socket option to be set/read without
holding the socket lock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make room in struct inet_sock by removing this bit field,
using one available bit in inet_flags instead.
Also move local_port_range to fill the resulting hole,
saving 8 bytes on 64bit arches.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
IP_BIND_ADDRESS_NO_PORT socket option can now be set/read
without locking the socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IP_NODEFRAG socket option can now be set/read
without locking the socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We move single bit fields to inet->inet_flags to avoid races.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IP_TRANSPARENT socket option can now be set/read
without locking the socket.
v2: removed unused issk variable in mptcp_setsockopt_sol_ip_set_transparent()
v4: rebased after commit 3f326a821b ("mptcp: change the mpc check helper to return a sk")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
IP_MULTICAST_ALL socket option can now be set/read
without locking the socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IP_MULTICAST_LOOP socket option can now be set/read
without locking the socket.
v3: fix build bot error reported in ipvs set_mcast_loop()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IP_HDRINCL socket option can now be set/read
without locking the socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IP_FREEBIND socket option can now be set/read
without locking the socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
IP_RECVERR_RFC4884 socket option can now be set/read
without locking the socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IP_RECVERR socket option can now be set/get without locking the socket.
This patch potentially avoid data-races around inet->recverr.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now we have inet->inet_flags, we can set following options
without having to hold the socket lock:
IP_PKTINFO, IP_RECVTTL, IP_RECVTOS, IP_RECVOPTS, IP_RETOPTS,
IP_PASSSEC, IP_RECVORIGDSTADDR, IP_RECVFRAGSIZE.
ip_sock_set_pktinfo() no longer hold the socket lock.
Similarly we can get the following options whithout holding
the socket lock:
IP_PKTINFO, IP_RECVTTL, IP_RECVTOS, IP_RECVOPTS, IP_RETOPTS,
IP_PASSSEC, IP_RECVORIGDSTADDR, IP_CHECKSUM, IP_RECVFRAGSIZE.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Various inet fields are currently racy.
do_ip_setsockopt() and do_ip_getsockopt() are mostly holding
the socket lock, but some (fast) paths do not.
Use a new inet->inet_flags to hold atomic bits in the series.
Remove inet->cmsg_flags, and use instead 9 bits from inet_flags.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix indentation of a type attribute of IPV6_VTI config entry.
Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ruan Jinjie says:
====================
net: Remove redundant of_match_ptr() macro
Since these net drivers depend on CONFIG_OF, there is
no need to wrap the macro of_match_ptr() here.
Changes in v3:
- Collect responses from v1 and v2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver depends on CONFIG_OF, it is not necessary to use
of_match_ptr() here.
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver depends on CONFIG_OF, it is not necessary to use
of_match_ptr() here.
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver depends on CONFIG_OF, it is not necessary to use
of_match_ptr() here.
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver depends on CONFIG_OF, it is not necessary to use
of_match_ptr() here.
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver depends on CONFIG_OF, it is not necessary to use
of_match_ptr() here.
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the module_misc_device macro to simplify the code, which is the
same as declaring with module_init() and module_exit().
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Retire some of my email addresses from Kernel activities.
Signed-off-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmTbRoQACgkQrB3Eaf9P
W7eY+A/8DJtqwFs1uAahS9jCX1bxf3vKKUPkEKu3IfcZVv2WcMVDI7XRLnBb93PC
GR1RQskCErXMrVv7mmaBuk/uZAcAFQUkzna3MmyAw4lFfJSWORD6rzGiFeDVMsvx
7gczhYC6aPwhiyAqk6eoNTKxLaZ0zfGiW9ZKWdZXuTjp+ijksa56gEdKsPwMQIht
FE4+CHia0dxFK0bUZMLHc4ixQbqKkHj/qVxB8k8zQnDgmCavjlEAnc+PAOX+SNxm
uju4gDV/9qXYOkHTwRD9/aPcvCofTlD9XynSHkMC24yLS6Ir4A1mFUZywNSiwcgX
//WxymD1N93inuHGzVluhm6Jy+4hTaS5p1y+H86L2TfC9b5SOrNYtj3yLB3aqDgq
1+4t4cVAtpk7uLfPYKzreDJH+CoxQDC8x+0dlzQUGnV11eIJ2RA0brJhFqHjOlbD
SAQtBwkPqlAXnrdDr2pUhyrlrwAGXux8T5u5tF3NSS3FEwh7akRBfU2HV6vPEE80
qPIxHSbA9d0j+tOjbkHIYEv9fMHHFC/aFLZMYOew016TKGBJth4g+DJqJiEEDZZh
iEIC62lrMV2qsyW5PdYdxGesZaAC/4koFCTBgkBYyIC/4gm5E74Ygu5B2A3xx89p
H6MlF3Miofsf9aSh1vw4cyb69mPaP1XG95OGTqc0qpV2XhhjpE0=
=UVTC
-----END PGP SIGNATURE-----
Merge tag 'ipsec-2023-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
1) Fix a slab-out-of-bounds read in xfrm_address_filter.
From Lin Ma.
2) Fix the pfkey sadb_x_filter validation.
From Lin Ma.
3) Use the correct nla_policy structure for XFRMA_SEC_CTX.
From Lin Ma.
4) Fix warnings triggerable by bad packets in the encap functions.
From Herbert Xu.
5) Fix some slab-use-after-free in decode_session6.
From Zhengchao Shao.
6) Fix a possible NULL piointer dereference in xfrm_update_ae_params.
Lin Ma.
7) Add a forgotten nla_policy for XFRMA_MTIMER_THRESH.
From Lin Ma.
8) Don't leak offloaded policies.
From Leon Romanovsky.
9) Delete also the offloading part of an acquire state.
From Leon Romanovsky.
Please pull or let me know if there are problems.
Jijie Shao says:
====================
hns3: refactor registers information for ethtool -d
refactor registers information for ethtool -d
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In the original RPU query command, the status register values of
multiple RPU tunnels are accumulated by default, which is unreasonable.
This patch Fix it by querying the specified tunnel ID.
The tunnel number of the device can be obtained from firmware
during initialization.
Fixes: ddb54554fa ("net: hns3: add DFX registers information for ethtool -d")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dump register function is being refactored.
The third step in refactoring is to support tlv info in regs data for
HNS3 PF driver.
Currently, if we use "ethtool -d" to dump regs value,
the output is as follows:
offset1: 00 01 02 03 04 05 ...
offset2:10 11 12 13 14 15 ...
......
We can't get the value of a register directly.
This patch deletes the original separator information and
add tag_len_value information in regs data.
ethtool can parse register data in key-value format by -d command.
a patch will be added to the ethtool to parse regs data
in the following format:
reg1 : value2
reg2 : value2
......
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dump register function is being refactored.
The second step in refactoring is to support tlv info in regs data for
HNS3 PF driver.
Currently, if we use "ethtool -d" to dump regs value,
the output is as follows:
offset1: 00 01 02 03 04 05 ...
offset2:10 11 12 13 14 15 ...
......
We can't get the value of a register directly.
This patch deletes the original separator information and
add tag_len_value information in regs data.
ethtool can parse register data in key-value format by -d command.
a patch will be added to the ethtool to parse regs data
in the following format:
reg1 : value2
reg2 : value2
......
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dump register function is being refactored.
The first step in refactoring is put the dump regs function
into a separate file.
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Fang says:
====================
net: fec: add XDP_TX feature support
This patch set is to support the XDP_TX feature of FEC driver, the first
patch is add initial XDP_TX support, and the second patch improves the
performance of XDP_TX by not using xdp_convert_buff_to_frame(). Please
refer to the commit message of each patch for more details.
====================
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As suggested by Jesper and Alexander, we can avoid converting xdp_buff
to xdp_frame in case of XDP_TX to save a bunch of CPU cycles, so that
we can further improve the XDP_TX performance.
Before this patch on i.MX8MP-EVK board, the performance shows as follows.
root@imx8mpevk:~# ./xdp2 eth0
proto 17: 353918 pkt/s
proto 17: 352923 pkt/s
proto 17: 353900 pkt/s
proto 17: 352672 pkt/s
proto 17: 353912 pkt/s
proto 17: 354219 pkt/s
After applying this patch, the performance is improved.
root@imx8mpevk:~# ./xdp2 eth0
proto 17: 369261 pkt/s
proto 17: 369267 pkt/s
proto 17: 369206 pkt/s
proto 17: 369214 pkt/s
proto 17: 369126 pkt/s
proto 17: 369272 pkt/s
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Suggested-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The XDP_TX feature is not supported before, and all the frames
which are deemed to do XDP_TX action actually do the XDP_DROP
action. So this patch adds the XDP_TX support to FEC driver.
I tested the performance of XDP_TX in XDP_DRV mode and XDP_SKB
mode respectively on i.MX8MP-EVK platform, and as suggested by
Jesper, I also tested the performance of XDP_REDIRECT on the
same platform. And the test steps and results are as follows.
XDP_TX test:
Step 1: One board is used as generator and connects to switch,and
the FEC port of DUT also connects to the switch. Both boards with
flow control off. Then the generator runs the
pktgen_sample03_burst_single_flow.sh script to generate and send
burst traffic to DUT. Note that the size of packet was set to 64
bytes and the procotol of packet was UDP in my test scenario. In
addition, the SMAC of the packet need to be different from the MAC
of the generator, because the xdp2 program will swap the DMAC and
SMAC of the packet and send it back to the generator. If the SMAC
of the generated packet is the MAC of the generator, the generator
will receive the returned traffic which increase the CPU loading
and significantly degrade the transmit speed of the generator, and
finally it affects the test of XDP_TX performance.
Step 2: The DUT runs the xdp2 program to transmit received UDP
packets back out on the same port where they were received.
root@imx8mpevk:~# ./xdp2 eth0
proto 17: 353918 pkt/s
proto 17: 352923 pkt/s
proto 17: 353900 pkt/s
proto 17: 352672 pkt/s
proto 17: 353912 pkt/s
proto 17: 354219 pkt/s
root@imx8mpevk:~# ./xdp2 -S eth0
proto 17: 160604 pkt/s
proto 17: 160708 pkt/s
proto 17: 160564 pkt/s
proto 17: 160684 pkt/s
proto 17: 160640 pkt/s
proto 17: 160720 pkt/s
The above results show that the XDP_TX performance of XDP_DRV mode
is much better than XDP_SKB mode, more than twice that of XDP_SKB
mode, which is in line with our expectation.
XDP_REDIRECT test:
Step1: Both the generator and the FEC port of the DUT connet to the
switch port. All the ports with flow control off, then the generator
runs the pktgen script to generate and send burst traffic to DUT.
Note that the size of packet was set to 64 bytes and the procotol of
packet was UDP in my test scenario.
Step2: The DUT runs the xdp_redirect program to redirect the traffic
from the FEC port to the FEC port itself.
root@imx8mpevk:~# ./xdp_redirect eth0 eth0
Redirecting from eth0 (ifindex 2; driver fec) to eth0
(ifindex 2; driver fec)
Summary 232,302 rx/s 0 err,drop/s 232,344 xmit/s
Summary 234,579 rx/s 0 err,drop/s 234,577 xmit/s
Summary 235,548 rx/s 0 err,drop/s 235,549 xmit/s
Summary 234,704 rx/s 0 err,drop/s 234,703 xmit/s
Summary 235,504 rx/s 0 err,drop/s 235,504 xmit/s
Summary 235,223 rx/s 0 err,drop/s 235,224 xmit/s
Summary 234,509 rx/s 0 err,drop/s 234,507 xmit/s
Summary 235,481 rx/s 0 err,drop/s 235,482 xmit/s
Summary 234,684 rx/s 0 err,drop/s 234,683 xmit/s
Summary 235,520 rx/s 0 err,drop/s 235,520 xmit/s
Summary 235,461 rx/s 0 err,drop/s 235,461 xmit/s
Summary 234,627 rx/s 0 err,drop/s 234,627 xmit/s
Summary 235,611 rx/s 0 err,drop/s 235,611 xmit/s
Packets received : 3,053,753
Average packets/s : 234,904
Packets transmitted : 3,053,792
Average transmit/s : 234,907
Compared the performance of XDP_TX with XDP_REDIRECT, XDP_TX is also
much better than XDP_REDIRECT. It's also in line with our expectation.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Suggested-by: Jesper Dangaard Brouer <hawk@kernel.org>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Return result of b44_writephy() instead of zero to
deal with possible error.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When run command "ip netns delete client", device link1_1 has been
deleted. So, it is no need to delete link1_1 again. Remove it.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Handle PTP out of order CQEs issue
2) Check FW status before determining reset successful
3) Expose maximum supported SFs via devlink resource
4) MISC cleanups
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmTan0YACgkQSD+KveBX
+j7Cugf/Smu/Pc8UWHXpOoqd7glZhxKhA+DruyltC3cS/XJSwG/cA+sjXUo4SCnR
TPP3ucd3lyIfwzBAJPJgpJDv+uznDhM4VVkD/RbTT5JXpW9aZ/vxwtmBggZ93YXN
1aQdSr+rvdJWx8hV6XYtkP6GQDqGugL3/lYcwEGSrOU+cds8RJzTnltEHIcc9ldr
gj7/N6JmliYvyDXqN7tK2YLSVAd0oouVDr19qDIYORctLnYHnCg06aTzqDJH1Hfh
xltQMPy+EtfCnKrsMr7Hkd9INrEECkP5l27a4PuTbFQDzCdzLClbAdAS3Mq4EG+q
e9byCdAlpLoBe5Sjmb4baCy/F4mkYg==
=6IRd
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2023-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2023-08-14
1) Handle PTP out of order CQEs issue
2) Check FW status before determining reset successful
3) Expose maximum supported SFs via devlink resource
4) MISC cleanups
* tag 'mlx5-updates-2023-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: Don't query MAX caps twice
net/mlx5: Remove unused MAX HCA capabilities
net/mlx5: Remove unused CAPs
net/mlx5: Fix error message in mlx5_sf_dev_state_change_handler()
net/mlx5: Remove redundant check of mlx5_vhca_event_supported()
net/mlx5: Use mlx5_sf_start_function_id() helper instead of directly calling MLX5_CAP_GEN()
net/mlx5: Remove redundant SF supported check from mlx5_sf_hw_table_init()
net/mlx5: Use auxiliary_device_uninit() instead of device_put()
net/mlx5: E-switch, Add checking for flow rule destinations
net/mlx5: Check with FW that sync reset completed successfully
net/mlx5: Expose max possible SFs via devlink resource
net/mlx5e: Add recovery flow for tx devlink health reporter for unhealthy PTP SQ
net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs
net/mlx5: Consolidate devlink documentation in devlink/mlx5.rst
====================
Link: https://lore.kernel.org/r/20230814214144.159464-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When developing specs its useful to know which attr space
YNL was trying to find an attribute in on key error.
Instead of printing:
KeyError: 0
add info about the space:
Exception: Space 'vport' has no attribute with value '0'
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20230814205627.2914583-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since the xarray changes we mix returning valid ifindex and negative
errno in a single int returned from dev_index_reserve(). This depends
on the fact that ifindexes can't be negative. Otherwise we may insert
into the xarray and return a very large negative value. This in turn
may break ERR_PTR().
OvS is susceptible to this problem and lacking validation (fix posted
separately for net).
Reject negative ifindex explicitly. Add a warning because the input
validation is better handled by the caller.
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230814205627.2914583-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Recent changes in net-next (commit 759ab1edb5 ("net: store netdevs
in an xarray")) refactored the handling of pre-assigned ifindexes
and let syzbot surface a latent problem in ovs. ovs does not validate
ifindex, making it possible to create netdev ports with negative
ifindex values. It's easy to repro with YNL:
$ ./cli.py --spec netlink/specs/ovs_datapath.yaml \
--do new \
--json '{"upcall-pid": 1, "name":"my-dp"}'
$ ./cli.py --spec netlink/specs/ovs_vport.yaml \
--do new \
--json '{"upcall-pid": "00000001", "name": "some-port0", "dp-ifindex":3,"ifindex":4294901760,"type":2}'
$ ip link show
-65536: some-port0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 7a:48:21:ad:0b:fb brd ff:ff:ff:ff:ff:ff
...
Validate the inputs. Now the second command correctly returns:
$ ./cli.py --spec netlink/specs/ovs_vport.yaml \
--do new \
--json '{"upcall-pid": "00000001", "name": "some-port0", "dp-ifindex":3,"ifindex":4294901760,"type":2}'
lib.ynl.NlError: Netlink error: Numerical result out of range
nl_len = 108 (92) nl_flags = 0x300 nl_type = 2
error: -34 extack: {'msg': 'integer out of range', 'unknown': [[type:4 len:36] b'\x0c\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x03\x00\xff\xff\xff\x7f\x00\x00\x00\x00\x08\x00\x01\x00\x08\x00\x00\x00'], 'bad-attr': '.ifindex'}
Accept 0 since it used to be silently ignored.
Fixes: 54c4ef34c4 ("openvswitch: allow specifying ifindex of new interfaces")
Reported-by: syzbot+7456b5dcf65111553320@syzkaller.appspotmail.com
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://lore.kernel.org/r/20230814203840.2908710-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mario reports that loading r8152 on his system leads to a:
netif_napi_add_weight() called with weight 256
warning getting printed. We don't have any solid data
on why such high budget was chosen, and it may cause
stalls in processing other softirqs and rt threads.
So try to switch back to the default (64) weight.
If this slows down someone's system we should investigate
which part of stopping starting the NAPI poll in this
driver are expensive.
Reported-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/all/0bfd445a-81f7-f702-08b0-bd5a72095e49@amd.com/
Acked-by: Hayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/20230814153521.2697982-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit bdfe2da6ae ("e1000e: cosmetic move of function prototypes to the new mac.h")
declared but never implemented them.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230814135821.4808-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Newer versions of clang warn about this variable being assigned but
never used:
drivers/net/ethernet/qlogic/qed/qed_vf.c:63:67: error: parameter 'resp_size' set but not used [-Werror,-Wunused-but-set-parameter]
There is no indication in the git history on how this was ever
meant to be used, so just remove the entire calculation and argument
passing for it to avoid the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230814074512.1067715-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Similar to commit 01f4fd2708 ("bonding: Fix incorrect deletion of
ETH_P_8021AD protocol vid from slaves"), we can trigger BUG_ON(!vlan_info)
in unregister_vlan_dev() with the following testcase:
# ip netns add ns1
# ip netns exec ns1 ip link add team1 type team
# ip netns exec ns1 ip link add team_slave type veth peer veth2
# ip netns exec ns1 ip link set team_slave master team1
# ip netns exec ns1 ip link add link team_slave name team_slave.10 type vlan id 10 protocol 802.1ad
# ip netns exec ns1 ip link add link team1 name team1.10 type vlan id 10 protocol 802.1ad
# ip netns exec ns1 ip link set team_slave nomaster
# ip netns del ns1
Add S-VLAN tag related features support to team driver. So the team driver
will always propagate the VLAN info to its slaves.
Fixes: 8ad227ff89 ("net: vlan: add 802.1ad support")
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230814032301.2804971-1-william.xuanziyang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Implement netdev trigger and primitive bliking offloading as well as
simple set_brigthness function for both PHY LEDs of the in-SoC PHYs
found in MT7981 and MT7988.
For MT7988, read boottrap register and apply LED polarities accordingly
to get uniform behavior from all LEDs on MT7988.
This requires syscon phandle 'mediatek,pio' present in parenting MDIO bus
which should point to the syscon holding the boottrap register.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/dc324d48c00cd7350f3a506eaa785324cae97372.1691977904.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>