syzbot found that if __inet_inherit_port() returns an error,
we call tcp_done() after inet_csk_prepare_forced_close(),
meaning the socket lock is no longer held.
We might fix this in a different way in net-next, but
for 5.4 it seems safer to relax the lockdep check.
Fixes: d983ea6f16 ("tcp: add rcu protection around tp->fastopen_rsk")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
the following script:
# tc qdisc add dev eth0 clsact
# tc filter add dev eth0 egress protocol ip matchall \
> action mpls push protocol mpls_uc label 0x355aa bos 1
causes corruption of all IP packets transmitted by eth0. On TC egress, we
can't rely on the value of skb->mac_len, because it's 0 and a MPLS 'push'
operation will result in an overwrite of the first 4 octets in the packet
L2 header (e.g. the Destination Address if eth0 is an Ethernet); the same
error pattern is present also in the MPLS 'pop' operation. Fix this error
in act_mpls data plane, computing 'mac_len' as the difference between the
network header and the mac header (when not at TC ingress), and use it in
MPLS 'push'/'pop' core functions.
v2: unbreak 'make htmldocs' because of missing documentation of 'mac_len'
in skb_mpls_pop(), reported by kbuild test robot
CC: Lorenzo Bianconi <lorenzo@kernel.org>
Fixes: 2a2ea50870 ("net: sched: add mpls manipulation actions to TC")
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
the following script:
# tc qdisc add dev eth0 clsact
# tc filter add dev eth0 egress matchall action mpls pop
implicitly makes the kernel drop all packets transmitted by eth0, if they
don't have a MPLS header. This behavior is uncommon: other encapsulations
(like VLAN) just let the packet pass unmodified. Since the result of MPLS
'pop' operation would be the same regardless of the presence / absence of
MPLS header(s) in the original packet, we can let skb_mpls_pop() return 0
when dealing with non-MPLS packets.
For the OVS use-case, this is acceptable because __ovs_nla_copy_actions()
already ensures that MPLS 'pop' operation only occurs with packets having
an MPLS Ethernet type (and there are no other callers in current code, so
the semantic change should be ok).
v2: better documentation of use-cases for skb_mpls_pop(), thanks to Simon
Horman
Fixes: 2a2ea50870 ("net: sched: add mpls manipulation actions to TC")
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While invalidating the dst, we assign backhole_netdev instead of
loopback device. However, this device does not have idev pointer
and hence no ip6_ptr even if IPv6 is enabled. Possibly this has
triggered the syzbot reported crash.
The syzbot report does not have reproducer, however, this is the
only device that doesn't have matching idev created.
Crash instruction is :
static inline bool ip6_ignore_linkdown(const struct net_device *dev)
{
const struct inet6_dev *idev = __in6_dev_get(dev);
return !!idev->cnf.ignore_routes_with_linkdown; <= crash
}
Also ipv6 always assumes presence of idev and never checks for it
being NULL (as does the above referenced code). So adding a idev
for the blackhole_netdev to avoid this class of crashes in the future.
Signed-off-by: David S. Miller <davem@davemloft.net>
For the sake of tcp_poll(), there are few places where we fetch
sk->sk_wmem_queued while this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make sure write
sides use corresponding WRITE_ONCE() to avoid store-tearing.
sk_wmem_queued_add() helper is added so that we can in
the future convert to ADD_ONCE() or equivalent if/when
available.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the sake of tcp_poll(), there are few places where we fetch
sk->sk_sndbuf while this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make sure write
sides use corresponding WRITE_ONCE() to avoid store-tearing.
Note that other transports probably need similar fixes.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the sake of tcp_poll(), there are few places where we fetch
sk->sk_rcvbuf while this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make sure write
sides use corresponding WRITE_ONCE() to avoid store-tearing.
Note that other transports probably need similar fixes.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There two places where we fetch tp->urg_seq while
this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make
sure write side use corresponding WRITE_ONCE() to avoid
store-tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are few places where we fetch tp->snd_nxt while
this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make
sure write sides use corresponding WRITE_ONCE() to avoid
store-tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are few places where we fetch tp->write_seq while
this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make
sure write sides use corresponding WRITE_ONCE() to avoid
store-tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are few places where we fetch tp->copied_seq while
this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make
sure write sides use corresponding WRITE_ONCE() to avoid
store-tearing.
Note that tcp_inq_hint() was already using READ_ONCE(tp->copied_seq)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are few places where we fetch tp->rcv_nxt while
this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make
sure write sides use corresponding WRITE_ONCE() to avoid
store-tearing.
Note that tcp_inq_hint() was already using READ_ONCE(tp->rcv_nxt)
syzbot reported :
BUG: KCSAN: data-race in tcp_poll / tcp_queue_rcv
write to 0xffff888120425770 of 4 bytes by interrupt on cpu 0:
tcp_rcv_nxt_update net/ipv4/tcp_input.c:3365 [inline]
tcp_queue_rcv+0x180/0x380 net/ipv4/tcp_input.c:4638
tcp_rcv_established+0xbf1/0xf50 net/ipv4/tcp_input.c:5616
tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1542
tcp_v4_rcv+0x1a03/0x1bf0 net/ipv4/tcp_ipv4.c:1923
ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
dst_input include/net/dst.h:442 [inline]
ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
__netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
__netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
napi_skb_finish net/core/dev.c:5671 [inline]
napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
read to 0xffff888120425770 of 4 bytes by task 7254 on cpu 1:
tcp_stream_is_readable net/ipv4/tcp.c:480 [inline]
tcp_poll+0x204/0x6b0 net/ipv4/tcp.c:554
sock_poll+0xed/0x250 net/socket.c:1256
vfs_poll include/linux/poll.h:90 [inline]
ep_item_poll.isra.0+0x90/0x190 fs/eventpoll.c:892
ep_send_events_proc+0x113/0x5c0 fs/eventpoll.c:1749
ep_scan_ready_list.constprop.0+0x189/0x500 fs/eventpoll.c:704
ep_send_events fs/eventpoll.c:1793 [inline]
ep_poll+0xe3/0x900 fs/eventpoll.c:1930
do_epoll_wait+0x162/0x180 fs/eventpoll.c:2294
__do_sys_epoll_pwait fs/eventpoll.c:2325 [inline]
__se_sys_epoll_pwait fs/eventpoll.c:2311 [inline]
__x64_sys_epoll_pwait+0xcd/0x170 fs/eventpoll.c:2311
do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 7254 Comm: syz-fuzzer Not tainted 5.3.0+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both tcp_v4_err() and tcp_v6_err() do the following operations
while they do not own the socket lock :
fastopen = tp->fastopen_rsk;
snd_una = fastopen ? tcp_rsk(fastopen)->snt_isn : tp->snd_una;
The problem is that without appropriate barrier, the compiler
might reload tp->fastopen_rsk and trigger a NULL deref.
request sockets are protected by RCU, we can simply add
the missing annotations and barriers to solve the issue.
Fixes: 168a8f5805 ("tcp: TCP Fast Open Server - main code path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf 2019-10-12
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) a bunch of small fixes. Nothing critical.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If an ICMP packet comes in on the UDP socket backing an AF_RXRPC socket as
the UDP socket is being shut down, rxrpc_error_report() may get called to
deal with it after sk_user_data on the UDP socket has been cleared, leading
to a NULL pointer access when this local endpoint record gets accessed.
Fix this by just returning immediately if sk_user_data was NULL.
The oops looks like the following:
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
...
RIP: 0010:rxrpc_error_report+0x1bd/0x6a9
...
Call Trace:
? sock_queue_err_skb+0xbd/0xde
? __udp4_lib_err+0x313/0x34d
__udp4_lib_err+0x313/0x34d
icmp_unreach+0x1ee/0x207
icmp_rcv+0x25b/0x28f
ip_protocol_deliver_rcu+0x95/0x10e
ip_local_deliver+0xe9/0x148
__netif_receive_skb_one_core+0x52/0x6e
process_backlog+0xdc/0x177
net_rx_action+0xf9/0x270
__do_softirq+0x1b6/0x39a
? smpboot_register_percpu_thread+0xce/0xce
run_ksoftirqd+0x1d/0x42
smpboot_thread_fn+0x19e/0x1b3
kthread+0xf1/0xf6
? kthread_delayed_work_timer_fn+0x83/0x83
ret_from_fork+0x24/0x30
Fixes: 17926a7932 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Reported-by: syzbot+611164843bd48cc2190c@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
smc_rx_recvmsg() first checks if data is available, and then if
RCV_SHUTDOWN is set. There is a race when smc_cdc_msg_recv_action() runs
in between these 2 checks, receives data and sets RCV_SHUTDOWN.
In that case smc_rx_recvmsg() would return from receive without to
process the available data.
Fix that with a final check for data available if RCV_SHUTDOWN is set.
Move the check for data into a function and call it twice.
And use the existing helper smc_rx_data_available().
Fixes: 952310ccf2 ("smc: receive data from RMBE")
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
smc_cdc_rxed_any_close_or_senddone() is used as an end condition for the
receive loop. This conflicts with smc_cdc_msg_recv_action() which could
run in parallel and set the bits checked by
smc_cdc_rxed_any_close_or_senddone() before the receive is processed.
In that case we could return from receive with no data, although data is
available. The same applies to smc_rx_wait().
Fix this by checking for RCV_SHUTDOWN only, which is set in
smc_cdc_msg_recv_action() after the receive was actually processed.
Fixes: 952310ccf2 ("smc: receive data from RMBE")
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
If creation of an SMCD link group with VLAN id fails, the initial
smc_ism_get_vlan() step has to be reverted as well.
Fixes: c6ba7c9ba4 ("net/smc: add base infrastructure for SMC-D and ISM")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
sk->sk_backlog.len can be written by BH handlers, and read
from process contexts in a lockless way.
Note the write side should also use WRITE_ONCE() or a variant.
We need some agreement about the best way to do this.
syzbot reported :
BUG: KCSAN: data-race in tcp_add_backlog / tcp_grow_window.isra.0
write to 0xffff88812665f32c of 4 bytes by interrupt on cpu 1:
sk_add_backlog include/net/sock.h:934 [inline]
tcp_add_backlog+0x4a0/0xcc0 net/ipv4/tcp_ipv4.c:1737
tcp_v4_rcv+0x1aba/0x1bf0 net/ipv4/tcp_ipv4.c:1925
ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
dst_input include/net/dst.h:442 [inline]
ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
__netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
__netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
napi_skb_finish net/core/dev.c:5671 [inline]
napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
virtnet_receive drivers/net/virtio_net.c:1323 [inline]
virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
napi_poll net/core/dev.c:6352 [inline]
net_rx_action+0x3ae/0xa50 net/core/dev.c:6418
read to 0xffff88812665f32c of 4 bytes by task 7292 on cpu 0:
tcp_space include/net/tcp.h:1373 [inline]
tcp_grow_window.isra.0+0x6b/0x480 net/ipv4/tcp_input.c:413
tcp_event_data_recv+0x68f/0x990 net/ipv4/tcp_input.c:717
tcp_rcv_established+0xbfe/0xf50 net/ipv4/tcp_input.c:5618
tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1542
sk_backlog_rcv include/net/sock.h:945 [inline]
__release_sock+0x135/0x1e0 net/core/sock.c:2427
release_sock+0x61/0x160 net/core/sock.c:2943
tcp_recvmsg+0x63b/0x1a30 net/ipv4/tcp.c:2181
inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
sock_recvmsg_nosec net/socket.c:871 [inline]
sock_recvmsg net/socket.c:889 [inline]
sock_recvmsg+0x92/0xb0 net/socket.c:885
sock_read_iter+0x15f/0x1e0 net/socket.c:967
call_read_iter include/linux/fs.h:1864 [inline]
new_sync_read+0x389/0x4f0 fs/read_write.c:414
__vfs_read+0xb1/0xc0 fs/read_write.c:427
vfs_read fs/read_write.c:461 [inline]
vfs_read+0x143/0x2c0 fs/read_write.c:446
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 7292 Comm: syz-fuzzer Not tainted 5.3.0+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
sock_rcvlowat() or int_sk_rcvlowat() might be called without the socket
lock for example from tcp_poll().
Use READ_ONCE() to document the fact that other cpus might change
sk->sk_rcvlowat under us and avoid KCSAN splats.
Use WRITE_ONCE() on write sides too.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
sk_add_backlog() callers usually read sk->sk_rcvbuf without
owning the socket lock. This means sk_rcvbuf value can
be changed by other cpus, and KCSAN complains.
Add READ_ONCE() annotations to document the lockless nature
of these reads.
Note that writes over sk_rcvbuf should also use WRITE_ONCE(),
but this will be done in separate patches to ease stable
backports (if we decide this is relevant for stable trees).
BUG: KCSAN: data-race in tcp_add_backlog / tcp_recvmsg
write to 0xffff88812ab369f8 of 8 bytes by interrupt on cpu 1:
__sk_add_backlog include/net/sock.h:902 [inline]
sk_add_backlog include/net/sock.h:933 [inline]
tcp_add_backlog+0x45a/0xcc0 net/ipv4/tcp_ipv4.c:1737
tcp_v4_rcv+0x1aba/0x1bf0 net/ipv4/tcp_ipv4.c:1925
ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
dst_input include/net/dst.h:442 [inline]
ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
__netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
__netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
napi_skb_finish net/core/dev.c:5671 [inline]
napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
virtnet_receive drivers/net/virtio_net.c:1323 [inline]
virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
napi_poll net/core/dev.c:6352 [inline]
net_rx_action+0x3ae/0xa50 net/core/dev.c:6418
read to 0xffff88812ab369f8 of 8 bytes by task 7271 on cpu 0:
tcp_recvmsg+0x470/0x1a30 net/ipv4/tcp.c:2047
inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
sock_recvmsg_nosec net/socket.c:871 [inline]
sock_recvmsg net/socket.c:889 [inline]
sock_recvmsg+0x92/0xb0 net/socket.c:885
sock_read_iter+0x15f/0x1e0 net/socket.c:967
call_read_iter include/linux/fs.h:1864 [inline]
new_sync_read+0x389/0x4f0 fs/read_write.c:414
__vfs_read+0xb1/0xc0 fs/read_write.c:427
vfs_read fs/read_write.c:461 [inline]
vfs_read+0x143/0x2c0 fs/read_write.c:446
ksys_read+0xd5/0x1b0 fs/read_write.c:587
__do_sys_read fs/read_write.c:597 [inline]
__se_sys_read fs/read_write.c:595 [inline]
__x64_sys_read+0x4c/0x60 fs/read_write.c:595
do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 7271 Comm: syz-fuzzer Not tainted 5.3.0+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
tcp_memory_pressure is read without holding any lock,
and its value could be changed on other cpus.
Use READ_ONCE() to annotate these lockless reads.
The write side is already using atomic ops.
Fixes: b8da51ebb1 ("tcp: introduce tcp_under_memory_pressure()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
reqsk_queue_empty() is called from inet_csk_listen_poll() while
other cpus might write ->rskq_accept_head value.
Use {READ|WRITE}_ONCE() to avoid compiler tricks
and potential KCSAN splats.
Fixes: fff1f3001c ("tcp: add a spinlock to protect struct request_sock_queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
As mentioned in https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE#it-may-improve-performance
a C compiler can legally transform :
if (memory_pressure && *memory_pressure)
*memory_pressure = 0;
to :
if (memory_pressure)
*memory_pressure = 0;
Fixes: 0604475119 ("tcp: add TCPMemoryPressuresChrono counter")
Fixes: 180d8cd942 ("foundations of per-cgroup memory pressure controlling.")
Fixes: 3ab224be6d ("[NET] CORE: Introducing new memory accounting interface.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
The flag NLM_F_ECHO aims to reply to the user the message notified to all
listeners.
It was not the case with the command RTM_NEWNSID, let's fix this.
Fixes: 0c7aecd4bd ("netns: add rtnl cmd to add and get peer netns ids")
Reported-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Tested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
If tcf_register_action failed, mirred_device_notifier
should be unregistered.
Fixes: 3b87956ea6 ("net sched: fix race in mirred device removal")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
When configuring a taprio instance if "flags" is not specified (or
it's zero), taprio currently replies with an "Invalid argument" error.
So, set the return value to zero after we are done with all the
checks.
Fixes: 9c66d15646 ("taprio: Add support for hardware offloading")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Acked-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
This patch is to fix a NULL-ptr deref in selinux_socket_connect_helper:
[...] kasan: GPF could be caused by NULL-ptr deref or user memory access
[...] RIP: 0010:selinux_socket_connect_helper+0x94/0x460
[...] Call Trace:
[...] selinux_sctp_bind_connect+0x16a/0x1d0
[...] security_sctp_bind_connect+0x58/0x90
[...] sctp_process_asconf+0xa52/0xfd0 [sctp]
[...] sctp_sf_do_asconf+0x785/0x980 [sctp]
[...] sctp_do_sm+0x175/0x5a0 [sctp]
[...] sctp_assoc_bh_rcv+0x285/0x5b0 [sctp]
[...] sctp_backlog_rcv+0x482/0x910 [sctp]
[...] __release_sock+0x11e/0x310
[...] release_sock+0x4f/0x180
[...] sctp_accept+0x3f9/0x5a0 [sctp]
[...] inet_accept+0xe7/0x720
It was caused by that the 'newsk' sk_socket was not set before going to
security sctp hook when processing asconf chunk with SCTP_PARAM_ADD_IP
or SCTP_PARAM_SET_PRIMARY:
inet_accept()->
sctp_accept():
lock_sock():
lock listening 'sk'
do_softirq():
sctp_rcv(): <-- [1]
asconf chunk arrives and
enqueued in 'sk' backlog
sctp_sock_migrate():
set asoc's sk to 'newsk'
release_sock():
sctp_backlog_rcv():
lock 'newsk'
sctp_process_asconf() <-- [2]
unlock 'newsk'
sock_graft():
set sk_socket <-- [3]
As it shows, at [1] the asconf chunk would be put into the listening 'sk'
backlog, as accept() was holding its sock lock. Then at [2] asconf would
get processed with 'newsk' as asoc's sk had been set to 'newsk'. However,
'newsk' sk_socket is not set until [3], while selinux_sctp_bind_connect()
would deref it, then kernel crashed.
Here to fix it by adding the chunk to sk_backlog until newsk sk_socket is
set when .accept() is done.
Note that sk->sk_socket can be NULL when the sock is closed, so SOCK_DEAD
flag is also needed to check in sctp_newsk_ready().
Thanks to Ondrej for reviewing the code.
Fixes: d452930fd3 ("selinux: Add SCTP support")
Reported-by: Ying Xu <yinxu@redhat.com>
Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
ip6erspan driver calls ether_setup(), after commit 61e84623ac
("net: centralize net_device min/max MTU checking"), the range
of mtu is [min_mtu, max_mtu], which is [68, 1500] by default.
It causes the dev mtu of the erspan device to not be greater
than 1500, this limit value is not correct for ip6erspan tap
device.
Fixes: 61e84623ac ("net: centralize net_device min/max MTU checking")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
* allow scanning when operating on radar channels in
ETSI regdomains
* accept deauth frames in IBSS - we have code to parse
and handle them, but were dropping them early
* fix an allocation failure path in hwsim
* fix a failure path memory leak in nl80211 FTM code
* fix RCU handling & locking in multi-BSSID parsing
* reject malformed SSID in mac80211 (this shouldn't
really be able to happen, but defense in depth)
* avoid userspace buffer overrun in ancient wext code
if SSID was too long
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl2cf4YACgkQB8qZga/f
l8T+FA/+J4yDjue4Zw1cX8jgBr3+8l2+5sJIs59rGAcR30HYHtQvkaq4UrOy3kJ+
w4zB2lqMIeMA/tT3+OaTLF2ELz1lBN5VqZ4fnNguFuYDOkNjzkZwQCe51z6gMwrW
xk3dxLJz+QCDO7GF3nQer0VFBChL7fVdlIYyEoq/obwYylOUqyRGxtXa8yIaxaEe
xE7OK67g8rVVestOGO+i/7F/ZJ6DX0hQ0ounWRoa1tTHvGjInD7t2qu69n6/66zO
5AIVbDXGaQ+Eegk54jlJcJKQI367WNpRqOJ2HUonkoAKWID8kxbp8LLWne3DqC9i
Ffmme2/johrGSlvgPBUCV9SUmxulQnf8vpt8mKMh2Po+1mLInFltPSPckkpoe12S
5ps47miIZQDnVEfaojG7I2LMlfSYM1olBkAQqcqiizMISn4AYvu8H0NXJBJXV+7j
ezgcyPCi9x2wWYy2KhKsoXPzHo8cdRUT0HbmQUe4zUXIhrlH7CZHHhkgnzR04Qz5
Mh1TWnKr1RJEC5sua1cVDeEBXyjJDfUI+zRuq6jIvWIya+QEQ0ZWUkdjTtLFIJkY
89e932SuHsUFEYqYYebyDnUQ7q8ZNGPVNpfC7/CTnU/T/uJoTFYDsB6ocmK3k9Cn
4AFC/EBXD4WAl2FXDdUdAMPyLLxQwc9AxJlXMq1LOyCIvQDGopY=
=k685
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-davem-2019-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
A number of fixes:
* allow scanning when operating on radar channels in
ETSI regdomains
* accept deauth frames in IBSS - we have code to parse
and handle them, but were dropping them early
* fix an allocation failure path in hwsim
* fix a failure path memory leak in nl80211 FTM code
* fix RCU handling & locking in multi-BSSID parsing
* reject malformed SSID in mac80211 (this shouldn't
really be able to happen, but defense in depth)
* avoid userspace buffer overrun in ancient wext code
if SSID was too long
====================
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
For TCA_ACT_KIND, we have to keep the backward compatibility too,
and rely on nla_strlcpy() to check and terminate the string with
a NUL.
Note for TC actions, nla_strcmp() is already used to compare kind
strings, so we don't need to fix other places.
Fixes: 199ce850ce ("net_sched: add policy validation for action attributes")
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Marcelo noticed a backward compatibility issue of TCA_KIND
after we move from NLA_STRING to NLA_NUL_STRING, so it is probably
too late to change it.
Instead, to make everyone happy, we can just insert a NUL to
terminate the string with nla_strlcpy() like we do for TC actions.
Fixes: 62794fc4fb ("net_sched: add max len check for TCA_KIND")
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
If llc_conn_state_process() sees that llc_conn_service() put the skb on
a list, it will drop one fewer references to it. This is wrong because
the current behavior is that llc_conn_service() never consumes a
reference to the skb.
The code also makes the number of skb references being dropped
conditional on which of ind_prim and cfm_prim are nonzero, yet neither
of these affects how many references are *acquired*. So there is extra
code that tries to fix this up by sometimes taking another reference.
Remove the unnecessary/broken refcounting logic and instead just add an
skb_get() before the only two places where an extra reference is
actually consumed.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
All callers of llc_conn_state_process() except llc_build_and_send_pkt()
(via llc_ui_sendmsg() -> llc_ui_send_data()) assume that it always
consumes a reference to the skb. Fix this caller to do the same.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
syzbot reported:
BUG: memory leak
unreferenced object 0xffff888116270800 (size 224):
comm "syz-executor641", pid 7047, jiffies 4294947360 (age 13.860s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 20 e1 2a 81 88 ff ff 00 40 3d 2a 81 88 ff ff . .*.....@=*....
backtrace:
[<000000004d41b4cc>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline]
[<000000004d41b4cc>] slab_post_alloc_hook mm/slab.h:439 [inline]
[<000000004d41b4cc>] slab_alloc_node mm/slab.c:3269 [inline]
[<000000004d41b4cc>] kmem_cache_alloc_node+0x153/0x2a0 mm/slab.c:3579
[<00000000506a5965>] __alloc_skb+0x6e/0x210 net/core/skbuff.c:198
[<000000001ba5a161>] alloc_skb include/linux/skbuff.h:1058 [inline]
[<000000001ba5a161>] alloc_skb_with_frags+0x5f/0x250 net/core/skbuff.c:5327
[<0000000047d9c78b>] sock_alloc_send_pskb+0x269/0x2a0 net/core/sock.c:2225
[<000000003828fe54>] sock_alloc_send_skb+0x32/0x40 net/core/sock.c:2242
[<00000000e34d94f9>] llc_ui_sendmsg+0x10a/0x540 net/llc/af_llc.c:933
[<00000000de2de3fb>] sock_sendmsg_nosec net/socket.c:652 [inline]
[<00000000de2de3fb>] sock_sendmsg+0x54/0x70 net/socket.c:671
[<000000008fe16e7a>] __sys_sendto+0x148/0x1f0 net/socket.c:1964
[...]
The bug is that llc_sap_state_process() always takes an extra reference
to the skb, but sometimes neither llc_sap_next_state() nor
llc_sap_state_process() itself drops this reference.
Fix it by changing llc_sap_next_state() to never consume a reference to
the skb, rather than sometimes do so and sometimes not. Then remove the
extra skb_get() and kfree_skb() from llc_sap_state_process().
Reported-by: syzbot+6bf095f9becf5efef645@syzkaller.appspotmail.com
Reported-by: syzbot+31c16aa4202dace3812e@syzkaller.appspotmail.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
In non-ETSI regulatory domains scan is blocked when operating channel
is a DFS channel. For ETSI, however, once DFS channel is marked as
available after the CAC, this channel will remain available (for some
time) even after leaving this channel.
Therefore a scan can be done without any impact on the availability
of the DFS channel as no new CAC is required after the scan.
Enable scan in mac80211 in these cases.
Signed-off-by: Aaron Komisar <aaron.komisar@tandemg.com>
Link: https://lore.kernel.org/r/1570024728-17284-1-git-send-email-aaron.komisar@tandemg.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We can process deauth frames and all, but we drop them very
early in the RX path today - this could never have worked.
Fixes: 2cc59e784b ("mac80211: reply to AUTH with DEAUTH if sta allocation fails in IBSS")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/20191004123706.15768-2-luca@coelho.fi
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
cfg80211_update_notlisted_nontrans() leaves the RCU critical session
too early, while still using nontrans_ssid which is RCU protected. In
addition, it performs a bunch of RCU pointer update operations such
as rcu_access_pointer and rcu_assign_pointer.
The caller, cfg80211_inform_bss_frame_data(), also accesses the RCU
pointer without holding the lock.
Just wrap all of this with bss_lock.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/20191004123706.15768-3-luca@coelho.fi
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In nl80211_get_ftm_responder_stats, a new skb is created via nlmsg_new
named msg. If nl80211hdr_put() fails, then msg should be released. The
return statement should be replace by goto to error handling code.
Fixes: 81e54d08d9 ("cfg80211: support FTM responder configuration/statistics")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Link: https://lore.kernel.org/r/20191004194220.19412-1-navid.emamdoost@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fix the cleanup of the crypto state on a call after the call has been
disconnected. As the call has been disconnected, its connection ref has
been discarded and so we can't go through that to get to the security ops
table.
Fix this by caching the security ops pointer in the rxrpc_call struct and
using that when freeing the call security state. Also use this in other
places we're dealing with call-specific security.
The symptoms look like:
BUG: KASAN: use-after-free in rxrpc_release_call+0xb2d/0xb60
net/rxrpc/call_object.c:481
Read of size 8 at addr ffff888062ffeb50 by task syz-executor.5/4764
Fixes: 1db88c5343 ("rxrpc: Fix -Wframe-larger-than= warnings from on-stack crypto")
Reported-by: syzbot+eed305768ece6682bb7f@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
The rxrpc_peer record needs to hold a reference on the rxrpc_local record
it points as the peer is used as a base to access information in the
rxrpc_local record.
This can cause problems in __rxrpc_put_peer(), where we need the network
namespace pointer, and in rxrpc_send_keepalive(), where we need to access
the UDP socket, leading to symptoms like:
BUG: KASAN: use-after-free in __rxrpc_put_peer net/rxrpc/peer_object.c:411
[inline]
BUG: KASAN: use-after-free in rxrpc_put_peer+0x685/0x6a0
net/rxrpc/peer_object.c:435
Read of size 8 at addr ffff888097ec0058 by task syz-executor823/24216
Fix this by taking a ref on the local record for the peer record.
Fixes: ace45bec6d ("rxrpc: Fix firewall route keepalive")
Fixes: 2baec2c3f8 ("rxrpc: Support network namespacing")
Reported-by: syzbot+b9be979c55f2bea8ed30@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
rxrpc_put_call() calls trace_rxrpc_call() after it has done the decrement
of the refcount - which looks at the debug_id in the call record. But
unless the refcount was reduced to zero, we no longer have the right to
look in the record and, indeed, it may be deleted by some other thread.
Fix this by getting the debug_id out before decrementing the refcount and
then passing that into the tracepoint.
Fixes: e34d4234b0 ("rxrpc: Trace rxrpc_call usage")
Signed-off-by: David Howells <dhowells@redhat.com>
rxrpc_put_*conn() calls trace_rxrpc_conn() after they have done the
decrement of the refcount - which looks at the debug_id in the connection
record. But unless the refcount was reduced to zero, we no longer have the
right to look in the record and, indeed, it may be deleted by some other
thread.
Fix this by getting the debug_id out before decrementing the refcount and
then passing that into the tracepoint.
Fixes: 363deeab6d ("rxrpc: Add connection tracepoint and client conn state tracepoint")
Signed-off-by: David Howells <dhowells@redhat.com>
rxrpc_put_peer() calls trace_rxrpc_peer() after it has done the decrement
of the refcount - which looks at the debug_id in the peer record. But
unless the refcount was reduced to zero, we no longer have the right to
look in the record and, indeed, it may be deleted by some other thread.
Fix this by getting the debug_id out before decrementing the refcount and
then passing that into the tracepoint.
This can cause the following symptoms:
BUG: KASAN: use-after-free in __rxrpc_put_peer net/rxrpc/peer_object.c:411
[inline]
BUG: KASAN: use-after-free in rxrpc_put_peer+0x685/0x6a0
net/rxrpc/peer_object.c:435
Read of size 8 at addr ffff888097ec0058 by task syz-executor823/24216
Fixes: 1159d4b496 ("rxrpc: Add a tracepoint to track rxrpc_peer refcounting")
Reported-by: syzbot+b9be979c55f2bea8ed30@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
When sendmsg() finds a call to continue on with, if the call is in an
inappropriate state, it doesn't release the ref it just got on that call
before returning an error.
This causes the following symptom to show up with kasan:
BUG: KASAN: use-after-free in rxrpc_send_keepalive+0x8a2/0x940
net/rxrpc/output.c:635
Read of size 8 at addr ffff888064219698 by task kworker/0:3/11077
where line 635 is:
whdr.epoch = htonl(peer->local->rxnet->epoch);
The local endpoint (which cannot be pinned by the call) has been released,
but not the peer (which is pinned by the call).
Fix this by releasing the call in the error path.
Fixes: 37411cad63 ("rxrpc: Fix potential NULL-pointer exception")
Reported-by: syzbot+d850c266e3df14da1d31@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Rajendra reported a kernel panic when a link was taken down:
[ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
[ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290
<snip>
[ 6870.570501] Call Trace:
[ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40
[ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0
[ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260
[ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430
[ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
[ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430
[ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490
[ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430
[ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0
[ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
[ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60
[ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70
[ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0
addrconf_dad_work is kicked to be scheduled when a device is brought
up. There is a race between addrcond_dad_work getting scheduled and
taking the rtnl lock and a process taking the link down (under rtnl).
The latter removes the host route from the inet6_addr as part of
addrconf_ifdown which is run for NETDEV_DOWN. The former attempts
to use the host route in __ipv6_ifa_notify. If the down event removes
the host route due to the race to the rtnl, then the BUG listed above
occurs.
Since the DAD sequence can not be aborted, add a check for the missing
host route in __ipv6_ifa_notify. The only way this should happen is due
to the previously mentioned race. The host route is created when the
address is added to an interface; it is only removed on a down event
where the address is kept. Add a warning if the host route is missing
AND the device is up; this is a situation that should never happen.
Fixes: f1705ec197 ("net: ipv6: Make address flushing on ifdown optional")
Reported-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>