Section 2.2.1 BTF_KIND_INT a bullet list was collapsed due to
text reflow in commit 9ab5305dbe ("docs/btf: reflow text to
fill up to 78 characters").
This patch correct the mistake. Also adjust next bullet list,
which is used for comparison, to get rendered the same way.
Fixes: 9ab5305dbe ("docs/btf: reflow text to fill up to 78 characters")
Link: https://www.kernel.org/doc/html/latest/bpf/btf.html#btf-kind-int
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Paul Chaignon says:
====================
The BPF verifier checks the maximum number of call stack frames twice,
first in the main CFG traversal (do_check) and then in a subsequent
traversal (check_max_stack_depth). If the second check fails, it logs a
'verifier bug' warning and errors out, as the number of call stack frames
should have been verified already.
However, the second check may fail without indicating a verifier bug: if
the excessive function calls reside in dead code, the main CFG traversal
may not visit them; the subsequent traversal visits all instructions,
including dead code.
This case raises the question of how invalid dead code should be treated.
The first patch implements the conservative option and rejects such code;
the second adds a test case.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds a test case with an excessive number of call stack frames
in dead code.
Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
Tested-by: Xiao Han <xiao.han@orange.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The BPF verifier checks the maximum number of call stack frames twice,
first in the main CFG traversal (do_check) and then in a subsequent
traversal (check_max_stack_depth). If the second check fails, it logs a
'verifier bug' warning and errors out, as the number of call stack frames
should have been verified already.
However, the second check may fail without indicating a verifier bug: if
the excessive function calls reside in dead code, the main CFG traversal
may not visit them; the subsequent traversal visits all instructions,
including dead code.
This case raises the question of how invalid dead code should be treated.
This patch implements the conservative option and rejects such code.
Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
Tested-by: Xiao Han <xiao.han@orange.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
syzkaller was able to generate the following UAF in bpf:
BUG: KASAN: use-after-free in lookup_last fs/namei.c:2269 [inline]
BUG: KASAN: use-after-free in path_lookupat.isra.43+0x9f8/0xc00 fs/namei.c:2318
Read of size 1 at addr ffff8801c4865c47 by task syz-executor2/9423
CPU: 0 PID: 9423 Comm: syz-executor2 Not tainted 4.20.0-rc1-next-20181109+
#110
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x244/0x39d lib/dump_stack.c:113
print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412
__asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
lookup_last fs/namei.c:2269 [inline]
path_lookupat.isra.43+0x9f8/0xc00 fs/namei.c:2318
filename_lookup+0x26a/0x520 fs/namei.c:2348
user_path_at_empty+0x40/0x50 fs/namei.c:2608
user_path include/linux/namei.h:62 [inline]
do_mount+0x180/0x1ff0 fs/namespace.c:2980
ksys_mount+0x12d/0x140 fs/namespace.c:3258
__do_sys_mount fs/namespace.c:3272 [inline]
__se_sys_mount fs/namespace.c:3269 [inline]
__x64_sys_mount+0xbe/0x150 fs/namespace.c:3269
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457569
Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fde6ed96c78 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000457569
RDX: 0000000020000040 RSI: 0000000020000000 RDI: 0000000000000000
RBP: 000000000072bf00 R08: 0000000020000340 R09: 0000000000000000
R10: 0000000000200000 R11: 0000000000000246 R12: 00007fde6ed976d4
R13: 00000000004c2c24 R14: 00000000004d4990 R15: 00000000ffffffff
Allocated by task 9424:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
__do_kmalloc mm/slab.c:3722 [inline]
__kmalloc_track_caller+0x157/0x760 mm/slab.c:3737
kstrdup+0x39/0x70 mm/util.c:49
bpf_symlink+0x26/0x140 kernel/bpf/inode.c:356
vfs_symlink+0x37a/0x5d0 fs/namei.c:4127
do_symlinkat+0x242/0x2d0 fs/namei.c:4154
__do_sys_symlink fs/namei.c:4173 [inline]
__se_sys_symlink fs/namei.c:4171 [inline]
__x64_sys_symlink+0x59/0x80 fs/namei.c:4171
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 9425:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
__cache_free mm/slab.c:3498 [inline]
kfree+0xcf/0x230 mm/slab.c:3817
bpf_evict_inode+0x11f/0x150 kernel/bpf/inode.c:565
evict+0x4b9/0x980 fs/inode.c:558
iput_final fs/inode.c:1550 [inline]
iput+0x674/0xa90 fs/inode.c:1576
do_unlinkat+0x733/0xa30 fs/namei.c:4069
__do_sys_unlink fs/namei.c:4110 [inline]
__se_sys_unlink fs/namei.c:4108 [inline]
__x64_sys_unlink+0x42/0x50 fs/namei.c:4108
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
In this scenario path lookup under RCU is racing with the final
unlink in case of symlinks. As Linus puts it in his analysis:
[...] We actually RCU-delay the inode freeing itself, but
when we do the final iput(), the "evict()" function is called
synchronously. Now, the simple fix would seem to just RCU-delay
the kfree() of the symlink data in bpf_evict_inode(). Maybe
that's the right thing to do. [...]
Al suggested to piggy-back on the ->destroy_inode() callback in
order to implement RCU deferral there which can then kfree() the
inode->i_link eventually right before putting inode back into
inode cache. By reusing free_inode_nonrcu() from there we can
avoid the need for our own inode cache and just reuse generic
one as we currently do.
And in-fact on top of all this we should just get rid of the
bpf_evict_inode() entirely. This means truncate_inode_pages_final()
and clear_inode() will then simply be called by the fs core via
evict(). Dropping the reference should really only be done when
inode is unhashed and nothing reachable anymore, so it's better
also moved into the final ->destroy_inode() callback.
Fixes: 0f98621bef ("bpf, inode: add support for symlinks and fix mtime/ctime")
Reported-by: syzbot+fb731ca573367b7f6564@syzkaller.appspotmail.com
Reported-by: syzbot+a13e5ead792d6df37818@syzkaller.appspotmail.com
Reported-by: syzbot+7a8ba368b47fdefca61e@syzkaller.appspotmail.com
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Analyzed-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/lkml/0000000000006946d2057bbd0eef@google.com/T/
Alexei Starovoitov says:
====================
pull-request: bpf 2019-03-24
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) libbpf verision fix up from Daniel.
2) fix liveness propagation from Jakub.
3) fix verbose print of refcounted regs from Martin.
4) fix for large map allocations from Martynas.
5) fix use after free in sanitize_ptr_alu from Xu.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
First one is fixing version in Makefile and shared object and
second one clarifies bump in version. Thanks!
v1 -> v2:
- Fix up soname, thanks Stanislav!
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The current documentation suggests that we would need to bump the
libbpf version on every change. Lets clarify this a bit more and
reflect what we do today in practice, that is, bumping it once per
development cycle.
Fixes: 76d1b894c5 ("libbpf: Document API and ABI conventions")
Reported-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Even though libbpf's versioning script for the linker (libbpf.map)
is pointing to 0.0.2, the BPF_EXTRAVERSION in the Makefile has
not been updated along with it and is therefore still on 0.0.1.
While fixing up, I also noticed that the generated shared object
versioning information is missing, typical convention is to have
a linker name (libbpf.so), soname (libbpf.so.0) and real name
(libbpf.so.0.0.2) for library management. This is based upon the
LIBBPF_VERSION as well.
The build will then produce the following bpf libraries:
# ll libbpf*
libbpf.a
libbpf.so -> libbpf.so.0.0.2
libbpf.so.0 -> libbpf.so.0.0.2
libbpf.so.0.0.2
# readelf -d libbpf.so.0.0.2 | grep SONAME
0x000000000000000e (SONAME) Library soname: [libbpf.so.0]
And install them accordingly:
# rm -rf /tmp/bld; mkdir /tmp/bld; make -j$(nproc) O=/tmp/bld install
Auto-detecting system features:
... libelf: [ on ]
... bpf: [ on ]
CC /tmp/bld/libbpf.o
CC /tmp/bld/bpf.o
CC /tmp/bld/nlattr.o
CC /tmp/bld/btf.o
CC /tmp/bld/libbpf_errno.o
CC /tmp/bld/str_error.o
CC /tmp/bld/netlink.o
CC /tmp/bld/bpf_prog_linfo.o
CC /tmp/bld/libbpf_probes.o
CC /tmp/bld/xsk.o
LD /tmp/bld/libbpf-in.o
LINK /tmp/bld/libbpf.a
LINK /tmp/bld/libbpf.so.0.0.2
LINK /tmp/bld/test_libbpf
INSTALL /tmp/bld/libbpf.a
INSTALL /tmp/bld/libbpf.so.0.0.2
# ll /usr/local/lib64/libbpf.*
/usr/local/lib64/libbpf.a
/usr/local/lib64/libbpf.so -> libbpf.so.0.0.2
/usr/local/lib64/libbpf.so.0 -> libbpf.so.0.0.2
/usr/local/lib64/libbpf.so.0.0.2
Fixes: 1bf4b05810 ("tools: bpftool: add probes for eBPF program types")
Fixes: 1b76c13e4b ("bpf tools: Introduce 'bpf' library and add bpf feature check")
Reported-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
clang produces a false-positive warning as it fails to notice
that "lost = true" implies that "ret" is initialized:
net/rxrpc/output.c:402:6: error: variable 'ret' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized]
if (lost)
^~~~
net/rxrpc/output.c:437:6: note: uninitialized use occurs here
if (ret >= 0) {
^~~
net/rxrpc/output.c:402:2: note: remove the 'if' if its condition is always false
if (lost)
^~~~~~~~~
net/rxrpc/output.c:339:9: note: initialize the variable 'ret' to silence this warning
int ret, opt;
^
= 0
Rearrange the code to make that more obvious and avoid the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When checking the code with clang -Wsometimes-uninitialized we get the
following warning:
if (!tipc_link_is_establishing(l)) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
net/tipc/node.c:847:46: note: uninitialized use occurs here
tipc_bearer_xmit(n->net, bearer_id, &xmitq, maddr);
net/tipc/node.c:831:2: note: remove the 'if' if its condition is always
true
if (!tipc_link_is_establishing(l)) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
net/tipc/node.c:821:31: note: initialize the variable 'maddr' to silence
this warning
struct tipc_media_addr *maddr;
We fix this by initializing 'maddr' to NULL. For the matter of clarity,
we also test if 'xmitq' is non-empty before we use it and 'maddr'
further down in the function. It will never happen that 'xmitq' is non-
empty at the same time as 'maddr' is NULL, so this is a sufficient test.
Fixes: 598411d70f ("tipc: make resetting of links non-atomic")
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A new mirred action is created by the tcf_mirred_init function. This
contains a list head struct which is inserted into a global list on
successful creation of a new action. However, after a creation, it is
still possible to error out and call the tcf_idr_release function. This,
in turn, calls the act_mirr cleanup function via __tcf_idr_release and
__tcf_action_put. This cleanup function tries to delete the list entry
which is as yet uninitialised, leading to a NULL pointer exception.
Fix this by initialising the list entry on creation of a new action.
Bug report:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
PGD 8000000840c73067 P4D 8000000840c73067 PUD 858dcc067 PMD 0
Oops: 0002 [#1] SMP PTI
CPU: 32 PID: 5636 Comm: handler194 Tainted: G OE 5.0.0+ #186
Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.3.6 06/03/2015
RIP: 0010:tcf_mirred_release+0x42/0xa7 [act_mirred]
Code: f0 90 39 c0 e8 52 04 57 c8 48 c7 c7 b8 80 39 c0 e8 94 fa d4 c7 48 8b 93 d0 00 00 00 48 8b 83 d8 00 00 00 48 c7 c7 f0 90 39 c0 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 d0 00
RSP: 0018:ffffac4aa059f688 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff9dcd1b214d00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9dcd1fa165f8 RDI: ffffffffc03990f0
RBP: ffff9dccf9c7af80 R08: 0000000000000a3b R09: 0000000000000000
R10: ffff9dccfa11f420 R11: 0000000000000000 R12: 0000000000000001
R13: ffff9dcd16b433c0 R14: ffff9dcd1b214d80 R15: 0000000000000000
FS: 00007f441bfff700(0000) GS:ffff9dcd1fa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000839e64004 CR4: 00000000001606e0
Call Trace:
tcf_action_cleanup+0x59/0xca
__tcf_action_put+0x54/0x6b
__tcf_idr_release.cold.33+0x9/0x12
tcf_mirred_init.cold.20+0x22e/0x3b0 [act_mirred]
tcf_action_init_1+0x3d0/0x4c0
tcf_action_init+0x9c/0x130
tcf_exts_validate+0xab/0xc0
fl_change+0x1ca/0x982 [cls_flower]
tc_new_tfilter+0x647/0x8d0
? load_balance+0x14b/0x9e0
rtnetlink_rcv_msg+0xe3/0x370
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x34/0x70
? _cond_resched+0x15/0x30
? __kmalloc_node_track_caller+0x1d4/0x2b0
? rtnl_calcit.isra.31+0xf0/0xf0
netlink_rcv_skb+0x49/0x110
netlink_unicast+0x16f/0x210
netlink_sendmsg+0x1df/0x390
sock_sendmsg+0x36/0x40
___sys_sendmsg+0x27b/0x2c0
? futex_wake+0x80/0x140
? do_futex+0x2b9/0xac0
? ep_scan_ready_list.constprop.22+0x1f2/0x210
? ep_poll+0x7a/0x430
__sys_sendmsg+0x47/0x80
do_syscall_64+0x55/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 4e232818bd ("net: sched: act_mirred: remove dependency on rtnl lock")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bartek reported that after few cable unplug/replug cycles suddenly
replug isn't detected any longer. His system uses a RTL8106, I wasn't
able to reproduce the issue with RTL8168g. According to his bisect
the referenced commit caused the regression. As Realtek doesn't
release datasheets or errata it's hard to say what's the actual root
cause, but this change was reported to fix the issue.
Fixes: 38caff5a44 ("r8169: handle all interrupt events in the hard irq handler")
Reported-by: Bartosz Skrzypczak <barteks2x@gmail.com>
Suggested-by: Bartosz Skrzypczak <barteks2x@gmail.com>
Tested-by: Bartosz Skrzypczak <barteks2x@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The call to of_get_child_by_name returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./drivers/net/ethernet/ti/netcp_ethss.c:3661:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 3654, but without a corresponding object release within this function.
./drivers/net/ethernet/ti/netcp_ethss.c:3665:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 3654, but without a corresponding object release within this function.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: Wingman Kwok <w-kwok2@ti.com>
Cc: Murali Karicheri <m-karicheri2@ti.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
The call to ehea_get_eth_dn returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./drivers/net/ethernet/ibm/ehea/ehea_main.c:3163:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 3154, but without a corresponding object release within this function.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: Douglas Miller <dougmill@linux.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
The call to of_parse_phandle returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./drivers/net/ethernet/xilinx/xilinx_axienet_main.c:1624:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 1569, but without a corresponding object release within this function.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: Anirudha Sarangi <anirudh@xilinx.com>
Cc: John Linn <John.Linn@xilinx.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: netdev@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
After 90eff9096c ("net: phy: Allow splitting MDIO bus/device support
from PHYs") the various MDIO bus drivers were no longer parented with
config PHYLIB but with config MDIO_BUS which is not a menuconfig, fix
this by depending on MDIO_DEVICE which is a menuconfig.
This is visually nicer and less confusing for users.
Fixes: 90eff9096c ("net: phy: Allow splitting MDIO bus/device support from PHYs")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the original patch network starts to crash on heavy load.
It's not fully clear why this additional register read has such side
effects, but removing it fixes the issue.
Thanks also to Alex for his contribution and hints.
[0] https://marc.info/?t=155268170400002&r=1&w=2
Fixes: e782410ed2 ("r8169: improve spurious interrupt detection")
Reported-by: VDR User <user.vdr@gmail.com>
Tested-by: VDR User <user.vdr@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 7640ead939 ("bpf: verifier: make sure callees don't prune
with caller differences") connected up parentage chains of all
frames of the stack. It didn't, however, ensure propagate_liveness()
propagates all liveness information along those chains.
This means pruning happening in the callee may generate explored
states with incomplete liveness for the chains in lower frames
of the stack.
The included selftest is similar to the prior one from commit
7640ead939 ("bpf: verifier: make sure callees don't prune with
caller differences"), where callee would prune regardless of the
difference in r8 state.
Now we also initialize r9 to 0 or 1 based on a result from get_random().
r9 is never read so the walk with r9 = 0 gets pruned (correctly) after
the walk with r9 = 1 completes.
The selftest is so arranged that the pruning will happen in the
callee. Since callee does not propagate read marks of r8, the
explored state at the pruning point prior to the callee will
now ignore r8.
Propagate liveness on all frames of the stack when pruning.
Fixes: f4d7e40a5b ("bpf: introduce function calls (verification)")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When there is only one byte in a frag, the current calculation
using "(size + HNS3_MAX_BD_SIZE - 1) >> HNS3_MAX_BD_SIZE_OFFSET"
will return zero, because HNS3_MAX_BD_SIZE is 65535 and
HNS3_MAX_BD_SIZE_OFFSET is 16. So it will cause tx error when
a frag's size is one byte.
This patch fixes it by using DIV_ROUND_UP.
Fixes: 3fe13ed95d ("net: hns3: avoid mult + div op in critical data path")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As it stands if a shrink is delayed because of an outstanding
rehash, we will go into a rescheduling loop without ever doing
the rehash.
This patch fixes this by still carrying out the rehash and then
rescheduling so that we can shrink after the completion of the
rehash should it still be necessary.
The return value of EEXIST captures this case and other cases
(e.g., another thread expanded/rehashed the table at the same
time) where we should still proceed with the rehash.
Fixes: da20420f83 ("rhashtable: Add nested tables")
Reported-by: Josh Elsasser <jelsasser@appneta.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Josh Elsasser <jelsasser@appneta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti says:
====================
net/sched: validate the control action with all the other parameters
currently, the kernel checks for bad values of the control action in
tcf_action_init_1(), after a successful call to the action's init()
function. When the control action is 'goto chain', this causes two
undesired behaviors:
1. "misconfigured action after replace that causes kernel crash":
if users replace a valid TC action with another one having invalid
control action, all the new configuration data (including the bad
control action) are applied successfully, even if the kernel returned
an error. As a consequence, it's possible to trigger a NULL pointer
dereference in the traffic path of every TC action (1), replacing the
control action with 'goto chain x', when chain <x> doesn't exist.
2. "refcount leak that makes kmemleak complain"
when a valid 'goto chain' action is overwritten with another action,
the kernel forgets to decrease refcounts in the chain.
The above problems can be fixed if we validate the control action in each
action's init() function, the same way as we are already doing for all the
other configuration parameters.
Now that chains can be released after an action is replaced, we need to
care about concurrent access of 'goto_chain' pointer: ensure we access it
through RCU, like we did with most action-specific configuration parameters.
- Patch 1 removes the wrong checks and provides functions that can be
used to properly validate control actions in individual actions
- Patch 2 to 16 fix individual actions, and add TDC selftest code to
verify the correct behavior (2)
- Patch 17 and 18 fix concurrent access issues on 'goto_chain', that can be
observed after the chain refcount leak is fixed.
Changes since v1:
- reword the cover letter
- condense the extack message in case tc_action_check_ctrlact() is called
with invalid parameters.
- add tcf_action_set_ctrlact() to avoid code duplication an make the
RCU-ification of 'goto_chain' easier.
- fix errors in act_ife, act_simple, act_skbedit, and avoid useless 'goto
end' in act_connmark, thanks a lot to Vlad Buslov.
- avoid dereferencing 'goto_chain' in tcf_gact_goto_chain_index(), so
we don't have to care about the grace period there.
- let actions respect the grace period when they release chains, thanks
to Cong Wang and Vlad Buslov.
Changes since RFC:
- include a fix for all TC actions
- add a selftest for each TC action
- squash fix for refcount leaks into a single patch, the first in the
series, thanks to Cong Wang
- ensure that chain refcount is released without tcfa_lock held, thanks
to Vlad Buslov
Notes:
(1) act_ipt didn't need any fix, as the control action is constantly equal
to TC_ACT_OK.
(2) the selftest for act_simple fails because userspace tc backend for
'simple' does not parse the control action correctly (and hardcodes it
to TC_ACT_PIPE).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
use RCU when accessing the action chain, to avoid use after free in the
traffic path when 'goto chain' is replaced on existing TC actions (see
script below). Since the control action is read in the traffic path
without holding the action spinlock, we need to explicitly ensure that
a->goto_chain is not NULL before dereferencing (i.e it's not sufficient
to rely on the value of TC_ACT_GOTO_CHAIN bits). Not doing so caused NULL
dereferences in tcf_action_goto_chain_exec() when the following script:
# tc chain add dev dd0 chain 42 ingress protocol ip flower \
> ip_proto udp action pass index 4
# tc filter add dev dd0 ingress protocol ip flower \
> ip_proto udp action csum udp goto chain 42 index 66
# tc chain del dev dd0 chain 42 ingress
(start UDP traffic towards dd0)
# tc action replace action csum udp pass index 66
was run repeatedly for several hours.
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Suggested-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
callers of tcf_gact_goto_chain_index() can potentially read an old value
of the chain index, or even dereference a NULL 'goto_chain' pointer,
because 'goto_chain' and 'tcfa_action' are read in the traffic path
without caring of concurrent write in the control path. The most recent
value of chain index can be read also from a->tcfa_action (it's encoded
there together with TC_ACT_GOTO_CHAIN bits), so we don't really need to
dereference 'goto_chain': just read the chain id from the control action.
Fixes: e457d86ada ("net: sched: add couple of goto_chain helpers")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- pass a pointer to struct tcf_proto in each actions's init() handler,
to allow validating the control action, checking whether the chain
exists and (eventually) refcounting it.
- remove code that validates the control action after a successful call
to the action's init() handler, and replace it with a test that forbids
addition of actions having 'goto_chain' and NULL goto_chain pointer at
the same time.
- add tcf_action_check_ctrlact(), that will validate the control action
and eventually allocate the action 'goto_chain' within the init()
handler.
- add tcf_action_set_ctrlact(), that will assign the control action and
swap the current 'goto_chain' pointer with the new given one.
This disallows 'goto_chain' on actions that don't initialize it properly
in their init() handler, i.e. calling tcf_action_check_ctrlact() after
successful IDR reservation and then calling tcf_action_set_ctrlact()
to assign 'goto_chain' and 'tcf_action' consistently.
By doing this, the kernel does not leak anymore refcounts when a valid
'goto chain' handle is replaced in TC actions, causing kmemleak splats
like the following one:
# tc chain add dev dd0 chain 42 ingress protocol ip flower \
> ip_proto tcp action drop
# tc chain add dev dd0 chain 43 ingress protocol ip flower \
> ip_proto udp action drop
# tc filter add dev dd0 ingress matchall \
> action gact goto chain 42 index 66
# tc filter replace dev dd0 ingress matchall \
> action gact goto chain 43 index 66
# echo scan >/sys/kernel/debug/kmemleak
<...>
unreferenced object 0xffff93c0ee09f000 (size 1024):
comm "tc", pid 2565, jiffies 4295339808 (age 65.426s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 08 00 06 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<000000009b63f92d>] tc_ctl_chain+0x3d2/0x4c0
[<00000000683a8d72>] rtnetlink_rcv_msg+0x263/0x2d0
[<00000000ddd88f8e>] netlink_rcv_skb+0x4a/0x110
[<000000006126a348>] netlink_unicast+0x1a0/0x250
[<00000000b3340877>] netlink_sendmsg+0x2c1/0x3c0
[<00000000a25a2171>] sock_sendmsg+0x36/0x40
[<00000000f19ee1ec>] ___sys_sendmsg+0x280/0x2f0
[<00000000d0422042>] __sys_sendmsg+0x5e/0xa0
[<000000007a6c61f9>] do_syscall_64+0x5b/0x180
[<00000000ccd07542>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<0000000013eaa334>] 0xffffffffffffffff
Fixes: db50514f9a ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f4 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both PCLK and HCLK are "required" clocks according to macb devicetree
documentation. There is a chance that devm_clk_get doesn't return a
negative error but just a NULL clock structure instead. In such a case
the driver proceeds as usual and uses pclk value 0 to calculate MDC
divisor which is incorrect. Hence fix the same in clock initialization.
Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the DP83825I ethernet PHY to the DP83822 driver.
These devices share the same WoL register bits and addresses.
The phy_driver init was made into a macro as there may be future
devices appended to this driver that will share the register space.
http://www.ti.com/lit/gpn/dp83825i
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree:
1) Remove a direct dependency with IPv6 introduced by the
sip_external_media feature, from Alin Nastac.
2) Fix bogus ENOENT when removing interval elements from set.
3) Set transport_header from br_netfilter to mimic the stack
behaviour, this partially fixes a checksum validation bug
from the SCTP connection tracking, from Xin Long.
4) Fix undefined reference to symbol in xt_TEE, due to missing
Kconfig dependencies, from Arnd Bergmann.
5) Check for NULL in skb_header_pointer() calls in ip6t_shr,
from Kangjie Lu.
6) Fix bogus EBUSY when removing an existing conntrack helper from
a transaction.
7) Fix module autoload of the redirect extension.
8) Remove duplicated transition in flowtable diagram in the existing
documentation.
9) Missing .release_ops call from error path in newrule() which
results module refcount leak, from Taehee Yoo.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In genl_register_family(), when idr_alloc() fails,
we forget to free the memory we possibly allocate for
family->attrbuf.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 2ae0f17df1 ("genetlink: use idr to track families")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When cancelling a subscription, we have to clear the cancel bit in the
request before iterating over any established subscriptions with memcmp.
Otherwise no subscription will ever be found, and it will not be
possible to explicitly unsubscribe individual subscriptions.
Fixes: 8985ecc7c1 ("tipc: simplify endianness handling in topology subscriber")
Signed-off-by: Erik Hugne <erik.hugne@gmail.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Syzkaller hit 'KASAN: use-after-free Write in sanitize_ptr_alu' bug.
Call trace:
dump_stack+0xbf/0x12e
print_address_description+0x6a/0x280
kasan_report+0x237/0x360
sanitize_ptr_alu+0x85a/0x8d0
adjust_ptr_min_max_vals+0x8f2/0x1ca0
adjust_reg_min_max_vals+0x8ed/0x22e0
do_check+0x1ca6/0x5d00
bpf_check+0x9ca/0x2570
bpf_prog_load+0xc91/0x1030
__se_sys_bpf+0x61e/0x1f00
do_syscall_64+0xc8/0x550
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fault injection trace:
kfree+0xea/0x290
free_func_state+0x4a/0x60
free_verifier_state+0x61/0xe0
push_stack+0x216/0x2f0 <- inject failslab
sanitize_ptr_alu+0x2b1/0x8d0
adjust_ptr_min_max_vals+0x8f2/0x1ca0
adjust_reg_min_max_vals+0x8ed/0x22e0
do_check+0x1ca6/0x5d00
bpf_check+0x9ca/0x2570
bpf_prog_load+0xc91/0x1030
__se_sys_bpf+0x61e/0x1f00
do_syscall_64+0xc8/0x550
entry_SYSCALL_64_after_hwframe+0x49/0xbe
When kzalloc() fails in push_stack(), free_verifier_state() will free
current verifier state. As push_stack() returns, dst_reg was restored
if ptr_is_dst_reg is false. However, as member of the cur_state,
dst_reg is also freed, and error occurs when dereferencing dst_reg.
Simply fix it by testing ret of push_stack() before restoring dst_reg.
Fixes: 979d63d50c ("bpf: prevent out of bounds speculation on pointer arithmetic")
Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Naresh reported that test_align fails because of the mismatch at the
verbose printout of the register states. The reason is due to the newly
added ref_obj_id.
ref_obj_id is only useful for refcounted reg. Thus, this patch fixes it
by only printing ref_obj_id for refcounted reg. While at it, it also uses
comma instead of space to separate between "id" and "ref_obj_id".
Fixes: 1b98658968 ("bpf: Fix bpf_tcp_sock and bpf_sk_fullsock issue related to bpf_sk_release")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Make sure we don't try to enqueue XDP_REDIRECT frames to an
inexistent FQ.
While it is guaranteed not to have more than one queue per core,
having fewer queues than CPUs on an interface is a valid
configuration.
Fixes: d678be1dc1 ("dpaa2-eth: add XDP_REDIRECT support")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lukas Wunner says:
====================
ks8851 fixes & cleanups
Four fixes and two cleanups for the Microchip (formerly Micrel) KSZ8851
SPI Ethernet driver.
Some of the fixes might even pass as stable material, but I haven't marked
them as such for cautiousness: Doesn't hurt letting them bake in linux-next
for a few weeks to raise the confidence, even though we've tested them
extensively on our Revolution Pi open source PLCs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>