In preparation for multi-CPU support, set CPU port LOOKUP MEMBER outside
the port loop and setup the LOOKUP MEMBER mask for user ports only to
the first CPU port.
This is to handle flooding condition where every CPU port is set as
target and prevent packet duplication for unknown frames from user ports.
Secondary CPU port LOOKUP MEMBER mask will be setup later when
port_change_master will be implemented.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20230730074113.21889-3-ansuelsmth@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Address learning should initially be turned off by the driver for port
operation in standalone mode, then the DSA core handles changes to it
via ds->ops->port_bridge_flags().
Currently this is not the case for qca8k where learning is enabled
unconditionally in qca8k_setup for every user port.
Handle ports configured in standalone mode by making the learning
configurable and not enabling it by default.
Implement .port_pre_bridge_flags and .port_bridge_flags dsa ops to
enable learning for bridge that request it and tweak
.port_stp_state_set to correctly disable learning when port is
configured in standalone mode.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20230730074113.21889-2-ansuelsmth@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently checksum is recalculated and dsa tag stripped even if we later
don't find the dev.
To improve code, exit early if we don't find the dev and skip additional
operation on the skb since it will be freed anyway.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20230730074113.21889-1-ansuelsmth@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pedro Tammela says:
====================
net/sched: improve class lifetime handling
Valis says[0]:
============
Three classifiers (cls_fw, cls_u32 and cls_route) always copy
tcf_result struct into the new instance of the filter on update.
This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.
============
Turns out these could have been spotted easily with proper warnings.
Improve the current class lifetime with wrappers that check for
overflow/underflow.
While at it add an extack for when a class in use is deleted.
[0] https://lore.kernel.org/all/20230721174856.3045-1-sec@valis.email/
====================
Link: https://lore.kernel.org/r/20230728153537.1865379-1-pctammela@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add extack to warn that delete was rejected because
the class is still in use
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add extack to warn that delete was rejected because
the class is still in use
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add extack to warn that delete was rejected because
the class is still in use
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add extack to warn that delete was rejected because
the class is still in use
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The 'filter_cnt' counter is used to control a Qdisc class lifetime.
Each filter referecing this class by its id will eventually
increment/decrement this counter in their respective
'add/update/delete' routines.
As these operations are always serialized under rtnl lock, we don't
need an atomic type like 'refcount_t'.
It also means that we lose the overflow/underflow checks already
present in refcount_t, which are valuable to hunt down bugs
where the unsigned counter wraps around as it aids automated tools
like syzkaller to scream in such situations.
Wrap the open coded increment/decrement into helper functions and
add overflow checks to the operations.
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When handling deduplicated compressed data, there can be multiple
decompressed extents pointing to the same compressed data in one shot.
In such cases, the bvecs which belong to the longest extent will be
selected as the primary bvecs for real decompressors to decode and the
other duplicated bvecs will be directly copied from the primary bvecs.
Previously, only relative offsets of the longest extent were checked to
decompress the primary bvecs. On rare occasions, it can be incorrect
if there are several extents with the same start relative offset.
As a result, some short bvecs could be selected for decompression and
then cause data corruption.
For example, as Shijie Sun reported off-list, considering the following
extents of a file:
117: 903345.. 915250 | 11905 : 385024.. 389120 | 4096
...
119: 919729.. 930323 | 10594 : 385024.. 389120 | 4096
...
124: 968881.. 980786 | 11905 : 385024.. 389120 | 4096
The start relative offset is the same: 2225, but extent 119 (919729..
930323) is shorter than the others.
Let's restrict the bvec length in addition to the start offset if bvecs
are not full.
Reported-by: Shijie Sun <sunshijie@xiaomi.com>
Fixes: 5c2a64252c ("erofs: introduce partial-referenced pclusters")
Tested-by Shijie Sun <sunshijie@xiaomi.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230719065459.60083-1-hsiangkao@linux.alibaba.com
Disabling preemption in sock_map_sk_acquire conflicts with GFP_ATOMIC
allocation later in sk_psock_init_link on PREEMPT_RT kernels, since
GFP_ATOMIC might sleep on RT (see bpf: Make BPF and PREEMPT_RT co-exist
patchset notes for details).
This causes calling bpf_map_update_elem on BPF_MAP_TYPE_SOCKMAP maps to
BUG (sleeping function called from invalid context) on RT kernels.
preempt_disable was introduced together with lock_sk and rcu_read_lock
in commit 99ba2b5aba ("bpf: sockhash, disallow bpf_tcp_close and update
in parallel"), probably to match disabled migration of BPF programs, and
is no longer necessary.
Remove preempt_disable to fix BUG in sock_map_update_common on RT.
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/all/20200224140131.461979697@linutronix.de/
Fixes: 99ba2b5aba ("bpf: sockhash, disallow bpf_tcp_close and update in parallel")
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20230728064411.305576-1-tglozar@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Matthieu Baerts says:
====================
mptcp: cleanup and improvements in the selftests
This small series of 4 patches adds some improvements in MPTCP
selftests:
- Patch 1 reworks the detailed report of mptcp_join.sh selftest to
better display what went well or wrong per test.
- Patch 2 adds colours (if supported, forced and/or not disabled) in
mptcp_join.sh selftest output to help spotting issues.
- Patch 3 modifies an MPTCP selftest tool to interact with the
path-manager via Netlink to always look for errors if any. This makes
sure odd behaviours can be seen in the logs and errors can be caught
later if needed.
- Patch 4 removes stdout and stderr redirections to /dev/null when using
pm_nl_ctl if no errors are expected in order to log odd behaviours.
====================
Link: https://lore.kernel.org/r/20230730-upstream-net-next-20230728-mptcp-selftests-misc-v1-0-7e9cc530a9cd@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
All pm_nl_ctl commands were muted. If there was an unexpected error with
one of them, this was simply not visible in the logs, making the
analysis very hard. It could also hide misuse of commands by mistake.
Now the output is only muted when we do expect to have an error, e.g.
when giving invalid arguments on purpose.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20230730-upstream-net-next-20230728-mptcp-selftests-misc-v1-4-7e9cc530a9cd@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If a Netlink command for the MPTCP path-managers is not valid, it is
important to check if there are errors. If yes, they need to be reported
instead of being ignored and exiting without errors.
Now if no replies are expected, an ACK from the kernelspace is asked by
the userspace in order to always expect a reply. We can use the same
buffer that is currently always >1024 bytes. Then we can check if there
is an error (err->error), print it if any and report the error.
After this modification, it is required to mute expected errors in
mptcp_join.sh and pm_netlink.sh selftests:
- when trying to add a bad endpoint, e.g. duplicated
- when trying to set the two limits above the hard limit
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20230730-upstream-net-next-20230728-mptcp-selftests-misc-v1-3-7e9cc530a9cd@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Thanks to the parent commit, it is easy to change the output and add
some colours to help spotting issues.
The colours are not used if stdout is redirected or if NO_COLOR env var
is set to 1 as specified in https://no-color.org.
It is possible to force displaying the colours even if stdout is
redirected by setting this env var:
SELFTESTS_MPTCP_LIB_COLOR_FORCE=1
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20230730-upstream-net-next-20230728-mptcp-selftests-misc-v1-2-7e9cc530a9cd@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch modifies how the detailed results are printed, mainly to
improve what is displayed in case of issue:
- Now the test name (title) is printed earlier, when starting the test
if it is not intentionally skipped: by doing that, errors linked to
a test will be printed after having written the test name and then
avoid confusions.
- Due to the previous item, it is required to add a new line after
having printed the test name because in case of error with a command,
it is better not to have the output in the middle of the screen.
- Each check is printed on a dedicated line with aligned status (ok,
skip, fail): it is easier to spot which one has failed, simpler to
manage in the code not having to deal with alignment case by case and
helpers can be used to uniform what is done. These helpers can also be
useful later to do more actions depending on the results or change in
one place what is printed.
- Info messages have been reduced and aligned as well. And info messages
about the creation of the default test files of 1 KB are no longer
printed.
Example:
001 no JOIN
syn [ ok ]
synack [ ok ]
ack [ ok ]
Or with a skip and a failure:
001 no JOIN
syn [ ok ]
synack [fail] got 42 JOIN[s] synack expected 0
Server ns stats
(...)
Client ns stats
(...)
ack [skip]
Or with info:
104 Infinite map
Test file (size 128 KB) for client
Test file (size 128 KB) for server
file received by server has inverted byte at 169
5 corrupted pkts
syn [ ok ]
synack [ ok ]
While at it, verify_listener_events() now also print more info in case
of failure and in pm_nl_check_endpoint(), the test is marked as failed
instead of skipped if no ID has been given (internal selftest issue).
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20230730-upstream-net-next-20230728-mptcp-selftests-misc-v1-1-7e9cc530a9cd@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
commit f421436a59 ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)")
introducted these but never implemented.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230729123456.36340-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix input argument parsing paths to skip from their error legs.
This fix helps to avoid false test failure reports without running
the test.
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com>
Link: https://lore.kernel.org/r/20230729002403.4278-1-skhan@linuxfoundation.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
valis says:
====================
net/sched Bind logic fixes for cls_fw, cls_u32 and cls_route
Three classifiers (cls_fw, cls_u32 and cls_route) always copy
tcf_result struct into the new instance of the filter on update.
This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.
This patch set fixes this issue in all affected classifiers by no longer
copying the tcf_result struct from the old filter.
====================
Link: https://lore.kernel.org/r/20230729123202.72406-1-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When route4_change() is called on an existing filter, the whole
tcf_result struct is always copied into the new instance of the filter.
This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.
Fix this by no longer copying the tcf_result struct from the old filter.
Fixes: 1109c00547 ("net: sched: RCU cls_route")
Reported-by: valis <sec@valis.email>
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: valis <sec@valis.email>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: M A Ramdhan <ramdhan@starlabs.sg>
Link: https://lore.kernel.org/r/20230729123202.72406-4-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When fw_change() is called on an existing filter, the whole
tcf_result struct is always copied into the new instance of the filter.
This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.
Fix this by no longer copying the tcf_result struct from the old filter.
Fixes: e35a8ee599 ("net: sched: fw use RCU")
Reported-by: valis <sec@valis.email>
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: valis <sec@valis.email>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: M A Ramdhan <ramdhan@starlabs.sg>
Link: https://lore.kernel.org/r/20230729123202.72406-3-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When u32_change() is called on an existing filter, the whole
tcf_result struct is always copied into the new instance of the filter.
This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.
Fix this by no longer copying the tcf_result struct from the old filter.
Fixes: de5df63228 ("net: sched: cls_u32 changes to knode must appear atomic to readers")
Reported-by: valis <sec@valis.email>
Reported-by: M A Ramdhan <ramdhan@starlabs.sg>
Signed-off-by: valis <sec@valis.email>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: M A Ramdhan <ramdhan@starlabs.sg>
Link: https://lore.kernel.org/r/20230729123202.72406-2-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hou Tao says:
====================
Patchset "Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP
maps" [0] changed per-map flush list to global per-cpu flush list
for cpumap, devmap and xskmap, but it forgot to remove these unused
fields from cpumap and devmap. So just remove these unused fields.
Comments and suggestions are always welcome.
[0]: https://lore.kernel.org/bpf/20191219061006.21980-1-bjorn.topel@gmail.com
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Commit 96360004b8 ("xdp: Make devmap flush_list common for all map
instances") removes the use of bpf_dtab_netdev::dtab in bq_enqueue(),
so just remove dtab from bpf_dtab_netdev.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230728014942.892272-3-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Since commit cdfafe98ca ("xdp: Make cpumap flush_list common for all
map instances"), cmap is no longer used, so just remove it.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230728014942.892272-2-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
syzbot reported an array-index-out-of-bounds when printing out bpf
insns. Further investigation shows the insn is illegal but
is printed out due to log level 1 or 2 before actual insn verification
in do_check().
This particular illegal insn is a MOVSX insn with offset value 2.
The legal offset value for MOVSX should be 8, 16 and 32.
The disasm sign-extension-size array index is calculated as
(insn->off / 8) - 1
and offset value 2 gives an out-of-bound index -1.
Tighten the checking for MOVSX insn in disasm.c to avoid
array-index-out-of-bounds issue.
Reported-by: syzbot+3758842a6c01012aa73b@syzkaller.appspotmail.com
Fixes: f835bb6222 ("bpf: Add kernel/bpftool asm support for new instructions")
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230731204534.1975311-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Hou Tao says:
====================
The patchset fixes two reported warning in cpu-map when running
xdp_redirect_cpu and some RT threads concurrently. Patch #1 fixes
the warning in __cpu_map_ring_cleanup() when kthread is stopped
prematurely. Patch #2 fixes the warning in __xdp_return() when
there are pending skbs in ptr_ring.
Please see individual patches for more details. And comments are always
welcome.
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
The following warning was reported when running xdp_redirect_cpu with
both skb-mode and stress-mode enabled:
------------[ cut here ]------------
Incorrect XDP memory type (-2128176192) usage
WARNING: CPU: 7 PID: 1442 at net/core/xdp.c:405
Modules linked in:
CPU: 7 PID: 1442 Comm: kworker/7:0 Tainted: G 6.5.0-rc2+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Workqueue: events __cpu_map_entry_free
RIP: 0010:__xdp_return+0x1e4/0x4a0
......
Call Trace:
<TASK>
? show_regs+0x65/0x70
? __warn+0xa5/0x240
? __xdp_return+0x1e4/0x4a0
......
xdp_return_frame+0x4d/0x150
__cpu_map_entry_free+0xf9/0x230
process_one_work+0x6b0/0xb80
worker_thread+0x96/0x720
kthread+0x1a5/0x1f0
ret_from_fork+0x3a/0x70
ret_from_fork_asm+0x1b/0x30
</TASK>
The reason for the warning is twofold. One is due to the kthread
cpu_map_kthread_run() is stopped prematurely. Another one is
__cpu_map_ring_cleanup() doesn't handle skb mode and treats skbs in
ptr_ring as XDP frames.
Prematurely-stopped kthread will be fixed by the preceding patch and
ptr_ring will be empty when __cpu_map_ring_cleanup() is called. But
as the comments in __cpu_map_ring_cleanup() said, handling and freeing
skbs in ptr_ring as well to "catch any broken behaviour gracefully".
Fixes: 11941f8a85 ("bpf: cpumap: Implement generic cpumap")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230729095107.1722450-3-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
The following warning was reported when running stress-mode enabled
xdp_redirect_cpu with some RT threads:
------------[ cut here ]------------
WARNING: CPU: 4 PID: 65 at kernel/bpf/cpumap.c:135
CPU: 4 PID: 65 Comm: kworker/4:1 Not tainted 6.5.0-rc2+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Workqueue: events cpu_map_kthread_stop
RIP: 0010:put_cpu_map_entry+0xda/0x220
......
Call Trace:
<TASK>
? show_regs+0x65/0x70
? __warn+0xa5/0x240
......
? put_cpu_map_entry+0xda/0x220
cpu_map_kthread_stop+0x41/0x60
process_one_work+0x6b0/0xb80
worker_thread+0x96/0x720
kthread+0x1a5/0x1f0
ret_from_fork+0x3a/0x70
ret_from_fork_asm+0x1b/0x30
</TASK>
The root cause is the same as commit 4369016497 ("bpf: cpumap: Fix memory
leak in cpu_map_update_elem"). The kthread is stopped prematurely by
kthread_stop() in cpu_map_kthread_stop(), and kthread() doesn't call
cpu_map_kthread_run() at all but XDP program has already queued some
frames or skbs into ptr_ring. So when __cpu_map_ring_cleanup() checks
the ptr_ring, it will find it was not emptied and report a warning.
An alternative fix is to use __cpu_map_ring_cleanup() to drop these
pending frames or skbs when kthread_stop() returns -EINTR, but it may
confuse the user, because these frames or skbs have been handled
correctly by XDP program. So instead of dropping these frames or skbs,
just make sure the per-cpu kthread is running before
__cpu_map_entry_alloc() returns.
After apply the fix, the error handle for kthread_stop() will be
unnecessary because it will always return 0, so just remove it.
Fixes: 6710e11269 ("bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230729095107.1722450-2-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
During unregister_netdevice_many_notify(), the ordering of our concerned
function calls is like this:
unregister_netdevice_many_notify
dev_shutdown
qdisc_put
clsact_destroy
tcx_uninstall
The syzbot reproducer triggered a case that the qdisc refcnt is not
zero during dev_shutdown().
tcx_uninstall() will then WARN_ON_ONCE(tcx_entry(entry)->miniq_active)
because the miniq is still active and the entry should not be freed.
The latter assumed that qdisc destruction happens before tcx teardown.
This fix is to avoid tcx_uninstall() doing tcx_entry_free() when the
miniq is still alive and let the clsact_destroy() do the free later, so
that we do not assume any specific ordering for either of them.
If still active, tcx_uninstall() does clear the entry when flushing out
the prog/link. clsact_destroy() will then notice the "!tcx_entry_is_active()"
and then does the tcx_entry_free() eventually.
Fixes: e420bed025 ("bpf: Add fd-based tcx multi-prog infra with link support")
Reported-by: syzbot+376a289e86a0fd02b9ba@syzkaller.appspotmail.com
Reported-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: syzbot+376a289e86a0fd02b9ba@syzkaller.appspotmail.com
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/222255fe07cb58f15ee662e7ee78328af5b438e4.1690549248.git.daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is no need to spam the kernel log with such an indication, remove
this message.
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230728183945.760531-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
These are never implemented since introduction in
commit d021c34405 ("VSOCK: Introduce VM Sockets")
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20230729122036.32988-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
commit f9aab6f2ce ("net/smc: immediate freeing in smc_lgr_cleanup_early()")
left behind smc_lgr_schedule_free_work_fast() declaration.
And since commit 349d43127d ("net/smc: fix kernel panic caused by race of smc_sock")
smc_ib_modify_qp_reset() is not used anymore.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Link: https://lore.kernel.org/r/20230729121929.17180-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shuah Khan says:
====================
Connector/proc_filter test fixes
The first patch fixes the LKFT reported compile error, second
one adds .gitignore.
====================
Applying the first 2 patches, third one resent separately.
Link: https://lore.kernel.org/r/cover.1690564372.git.skhan@linuxfoundation.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test compile fails with following errors. Fix the Makefile
CFLAGS to include KHDR_INCLUDES to pull in uapi defines.
gcc -Wall proc_filter.c -o ../tools/testing/selftests/connector/proc_filter
proc_filter.c: In function ‘send_message’:
proc_filter.c:22:33: error: invalid application of ‘sizeof’ to incomplete type ‘struct proc_input’
22 | sizeof(struct proc_input))
| ^~~~~~
proc_filter.c:42:19: note: in expansion of macro ‘NL_MESSAGE_SIZE’
42 | char buff[NL_MESSAGE_SIZE];
| ^~~~~~~~~~~~~~~
proc_filter.c:22:33: error: invalid application of ‘sizeof’ to incomplete type ‘struct proc_input’
22 | sizeof(struct proc_input))
| ^~~~~~
proc_filter.c:48:34: note: in expansion of macro ‘NL_MESSAGE_SIZE’
48 | hdr->nlmsg_len = NL_MESSAGE_SIZE;
| ^~~~~~~~~~~~~~~
`
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Link: https://lore.kernel.org/all/CA+G9fYt=6ysz636XcQ=-KJp7vJcMZ=NjbQBrn77v7vnTcfP2cA@mail.gmail.com/
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/d0055c8cdf18516db8ba9edec99cfc5c08f32a7c.1690564372.git.skhan@linuxfoundation.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Replace uses of i40e_status to as equivalent as possible error codes.
Remove enum i40e_status as it is no longer needed
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230728171336.2446156-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
commit 8a59f9d1e3 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()")
left behind tcp_bpf_get_proto() declaration. And tcp_v4_tw_remember_stamp()
function is remove in ccb7c410dd ("timewait_sock: Create and use getpeer op.").
Since commit 686989700c ("tcp: simplify tcp_mark_skb_lost")
tcp_skb_mark_lost_uncond_verify() declaration is not used anymore.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230729122644.10648-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The two mbox-related mutexes are destroyed in octep_ctrl_mbox_uninit(),
but the corresponding mutex_init calls were missing.
A "DEBUG_LOCKS_WARN_ON(lock->magic != lock)" warning was emitted with
CONFIG_DEBUG_MUTEXES on.
Initialize the two mutexes in octep_ctrl_mbox_init().
Fixes: 577f0d1b1c ("octeon_ep: add separate mailbox command and response queues")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230729151516.24153-1-mschmidt@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Similarly to other recently fixed drivers make sure we don't
try to access XDP or page pool APIs when NAPI budget is 0.
NAPI budget of 0 may mean that we are in netpoll.
This may result in running software IRQs in hard IRQ context,
leading to deadlocks or crashes.
To make sure bnapi->tx_pkts don't get wiped without handling
the events, move clearing the field into the handler itself.
Remember to clear tx_pkts after reset (bnxt_enable_napi())
as it's technically possible that netpoll will accumulate
some tx_pkts and then a reset will happen, leaving tx_pkts
out of sync with reality.
Fixes: 322b87ca55 ("bnxt_en: add page_pool support")
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230728205020.2784844-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
During qdisc create/delete, it is necessary to rebuild the queue
of VSIs. An error occurred because the VSIs created by RDMA were
still active.
Added check if RDMA is active. If yes, it disallows qdisc changes
and writes a message in the system logs.
Fixes: 348048e724 ("ice: Implement iidc operations")
Signed-off-by: Rafal Rogalski <rafalx.rogalski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Signed-off-by: Kamil Maziarz <kamil.maziarz@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230728171243.2446101-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a struct_group for the whole packet body so we can copy it in one
go without triggering FORTIFY_SOURCE complaints.
Fixes: cf60ed4696 ("sfc: use padding to fix alignment in loopback test")
Fixes: 30c24dd87f ("sfc: siena: use padding to fix alignment in loopback test")
Fixes: 1186c6b31e ("sfc: falcon: use padding to fix alignment in loopback test")
Reviewed-by: Andy Moreton <andy.moreton@amd.com>
Tested-by: Andy Moreton <andy.moreton@amd.com>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230728165528.59070-1-edward.cree@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
devlink_port_region_destroy() is never implemented since
commit 544e7c33ec ("net: devlink: Add support for port regions").
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230728132113.32888-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are already INDIRECT_CALLABLE_DECLARE in the hashtable
headers, no need to declare them again.
Fixes: 0f495f7617 ("net: remove duplicate reuseport_lookup functions")
Suggested-by: Martin Lau <martin.lau@linux.dev>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230731-indir-call-v1-1-4cd0aeaee64f@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
The timer dev->stat_monitor can schedule the delayed work dev->wq and
the delayed work dev->wq can also arm the dev->stat_monitor timer.
When the device is detaching, the net_device will be deallocated. but
the net_device private data could still be dereferenced in delayed work
or timer handler. As a result, the UAF bugs will happen.
One racy situation is shown below:
(Thread 1) | (Thread 2)
lan78xx_stat_monitor() |
... | lan78xx_disconnect()
lan78xx_defer_kevent() | ...
... | cancel_delayed_work_sync(&dev->wq);
schedule_delayed_work() | ...
(wait some time) | free_netdev(net); //free net_device
lan78xx_delayedwork() |
//use net_device private data |
dev-> //use |
Although we use cancel_delayed_work_sync() to cancel the delayed work
in lan78xx_disconnect(), it could still be scheduled in timer handler
lan78xx_stat_monitor().
Another racy situation is shown below:
(Thread 1) | (Thread 2)
lan78xx_delayedwork |
mod_timer() | lan78xx_disconnect()
| cancel_delayed_work_sync()
(wait some time) | if (timer_pending(&dev->stat_monitor))
| del_timer_sync(&dev->stat_monitor);
lan78xx_stat_monitor() | ...
lan78xx_defer_kevent() | free_netdev(net); //free
//use net_device private data|
dev-> //use |
Although we use del_timer_sync() to delete the timer, the function
timer_pending() returns 0 when the timer is activated. As a result,
the del_timer_sync() will not be executed and the timer could be
re-armed.
In order to mitigate this bug, We use timer_shutdown_sync() to shutdown
the timer and then use cancel_delayed_work_sync() to cancel the delayed
work. As a result, the net_device could be deallocated safely.
What's more, the dev->flags is set to EVENT_DEV_DISCONNECT in
lan78xx_disconnect(). But it could still be set to EVENT_STAT_UPDATE
in lan78xx_stat_monitor(). So this patch put the set_bit() behind
timer_shutdown_sync().
Fixes: 77dfff5bb7 ("lan78xx: Fix race condition in disconnect handling")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Markus Schneider-Pargmann <msp@baylibre.com> says:
This series introduces two new chips tcan-4552 and tcan-4553. The
generic driver works in general but needs a few small changes. These
are caused by the removal of wake and state pins.
Changes in v4:
- Use printk("... %pe\n", ERR_PTR(ret)) for new printks
- Link to v3: https://lore.kernel.org/lkml/20230721135009.1120562-1-msp@baylibre.com
Changes in v3:
- Rebased to v6.5-rc1
- Removed devicetree compatible check in tcan driver. The device version
is now unconditionally detected using the ID2 register
- Link to v2: https://lore.kernel.org/lkml/20230621093103.3134655-1-msp@baylibre.com
Changes in v2:
- Update the binding documentation to specify tcan4552 and tcan4553 with
the tcan4x5x as fallback
- Update the driver to use auto detection as well. If compatible differs
from the ID2 register, use the ID2 register and print a warning.
- Small style changes
- Link to v1: https://lore.kernel.org/lkml/20230314151201.2317134-1-msp@baylibre.com
Link: https://lore.kernel.org/all/20230728141923.162477-1-msp@baylibre.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
To be able to understand issues during probe easier, add error messages
if something fails.
Signed-off-by: Markus Schneider-Pargmann <msp@baylibre.com>
Link: https://lore.kernel.org/all/20230728141923.162477-7-msp@baylibre.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>