326 Commits
Author | SHA1 | Message | Date |
---|---|---|---|
Peilin Ye | 84ad0af0bc |
net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting
mini_Qdisc_pair::p_miniq is a double pointer to mini_Qdisc, initialized in ingress_init() to point to net_device::miniq_ingress. ingress Qdiscs access this per-net_device pointer in mini_qdisc_pair_swap(). Similar for clsact Qdiscs and miniq_egress. Unfortunately, after introducing RTNL-unlocked RTM_{NEW,DEL,GET}TFILTER requests (thanks Hillf Danton for the hint), when replacing ingress or clsact Qdiscs, for example, the old Qdisc ("@old") could access the same miniq_{in,e}gress pointer(s) concurrently with the new Qdisc ("@new"), causing race conditions [1] including a use-after-free bug in mini_qdisc_pair_swap() reported by syzbot: BUG: KASAN: slab-use-after-free in mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573 Write of size 8 at addr ffff888045b31308 by task syz-executor690/14901 ... Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106 print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:319 print_report mm/kasan/report.c:430 [inline] kasan_report+0x11c/0x130 mm/kasan/report.c:536 mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573 tcf_chain_head_change_item net/sched/cls_api.c:495 [inline] tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:509 tcf_chain_tp_insert net/sched/cls_api.c:1826 [inline] tcf_chain_tp_insert_unique net/sched/cls_api.c:1875 [inline] tc_new_tfilter+0x1de6/0x2290 net/sched/cls_api.c:2266 ... @old and @new should not affect each other. In other words, @old should never modify miniq_{in,e}gress after @new, and @new should not update @old's RCU state. Fixing without changing sch_api.c turned out to be difficult (please refer to Closes: for discussions). Instead, make sure @new's first call always happen after @old's last call (in {ingress,clsact}_destroy()) has finished: In qdisc_graft(), return -EBUSY if @old has any ongoing filter requests, and call qdisc_destroy() for @old before grafting @new. Introduce qdisc_refcount_dec_if_one() as the counterpart of qdisc_refcount_inc_nz() used for filter requests. Introduce a non-static version of qdisc_destroy() that does a TCQ_F_BUILTIN check, just like qdisc_put() etc. Depends on patch "net/sched: Refactor qdisc_graft() for ingress and clsact Qdiscs". [1] To illustrate, the syzkaller reproducer adds ingress Qdiscs under TC_H_ROOT (no longer possible after commit |
|
Eric Dumazet | d636fc5dd6 |
net: sched: add rcu annotations around qdisc->qdisc_sleeping
syzbot reported a race around qdisc->qdisc_sleeping [1]
It is time we add proper annotations to reads and writes to/from
qdisc->qdisc_sleeping.
[1]
BUG: KCSAN: data-race in dev_graft_qdisc / qdisc_lookup_rcu
read to 0xffff8881286fc618 of 8 bytes by task 6928 on cpu 1:
qdisc_lookup_rcu+0x192/0x2c0 net/sched/sch_api.c:331
__tcf_qdisc_find+0x74/0x3c0 net/sched/cls_api.c:1174
tc_get_tfilter+0x18f/0x990 net/sched/cls_api.c:2547
rtnetlink_rcv_msg+0x7af/0x8c0 net/core/rtnetlink.c:6386
netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2546
rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6413
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0x375/0x4c0 net/socket.c:2503
___sys_sendmsg net/socket.c:2557 [inline]
__sys_sendmsg+0x1e3/0x270 net/socket.c:2586
__do_sys_sendmsg net/socket.c:2595 [inline]
__se_sys_sendmsg net/socket.c:2593 [inline]
__x64_sys_sendmsg+0x46/0x50 net/socket.c:2593
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
write to 0xffff8881286fc618 of 8 bytes by task 6912 on cpu 0:
dev_graft_qdisc+0x4f/0x80 net/sched/sch_generic.c:1115
qdisc_graft+0x7d0/0xb60 net/sched/sch_api.c:1103
tc_modify_qdisc+0x712/0xf10 net/sched/sch_api.c:1693
rtnetlink_rcv_msg+0x807/0x8c0 net/core/rtnetlink.c:6395
netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2546
rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6413
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0x375/0x4c0 net/socket.c:2503
___sys_sendmsg net/socket.c:2557 [inline]
__sys_sendmsg+0x1e3/0x270 net/socket.c:2586
__do_sys_sendmsg net/socket.c:2595 [inline]
__se_sys_sendmsg net/socket.c:2593 [inline]
__x64_sys_sendmsg+0x46/0x50 net/socket.c:2593
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 6912 Comm: syz-executor.5 Not tainted 6.4.0-rc3-syzkaller-00190-g0d85b27b0cc6 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/16/2023
Fixes:
|
|
Yajun Deng | 2f0f9465ad |
net: sched: Print msecs when transmit queue time out
The kernel will print several warnings in a short period of time when it stalls. Like this: First warning: [ 7100.097547] ------------[ cut here ]------------ [ 7100.097550] NETDEV WATCHDOG: eno2 (xxx): transmit queue 8 timed out [ 7100.097571] WARNING: CPU: 8 PID: 0 at net/sched/sch_generic.c:467 dev_watchdog+0x260/0x270 ... Second warning: [ 7147.756952] rcu: INFO: rcu_preempt self-detected stall on CPU [ 7147.756958] rcu: 24-....: (59999 ticks this GP) idle=546/1/0x400000000000000 softirq=367 3137/3673146 fqs=13844 [ 7147.756960] (t=60001 jiffies g=4322709 q=133381) [ 7147.756962] NMI backtrace for cpu 24 ... We calculate that the transmit queue start stall should occur before 7095s according to watchdog_timeo, the rcu start stall at 7087s. These two times are close together, it is difficult to confirm which happened first. To let users know the exact time the stall started, print msecs when the transmit queue time out. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Jakub Kicinski | 60ad1100d5 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
tools/testing/selftests/net/.gitignore sort the net-next version and use it Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
|
Wang Hai | f612466ebe |
net/sched: fix netdevice reference leaks in attach_default_qdiscs()
In attach_default_qdiscs(), if a dev has multiple queues and queue 0 fails
to attach qdisc because there is no memory in attach_one_default_qdisc().
Then dev->qdisc will be noop_qdisc by default. But the other queues may be
able to successfully attach to default qdisc.
In this case, the fallback to noqueue process will be triggered. If the
original attached qdisc is not released and a new one is directly
attached, this will cause netdevice reference leaks.
The following is the bug log:
veth0: default qdisc (fq_codel) fail, fallback to noqueue
unregister_netdevice: waiting for veth0 to become free. Usage count = 32
leaked reference.
qdisc_alloc+0x12e/0x210
qdisc_create_dflt+0x62/0x140
attach_one_default_qdisc.constprop.41+0x44/0x70
dev_activate+0x128/0x290
__dev_open+0x12a/0x190
__dev_change_flags+0x1a2/0x1f0
dev_change_flags+0x23/0x60
do_setlink+0x332/0x1150
__rtnl_newlink+0x52f/0x8e0
rtnl_newlink+0x43/0x70
rtnetlink_rcv_msg+0x140/0x3b0
netlink_rcv_skb+0x50/0x100
netlink_unicast+0x1bb/0x290
netlink_sendmsg+0x37c/0x4e0
sock_sendmsg+0x5f/0x70
____sys_sendmsg+0x208/0x280
Fix this bug by clearing any non-noop qdiscs that may have been assigned
before trying to re-attach.
Fixes:
|
|
Zhengchao Shao | 44387d1736 |
net: sched: remove unnecessary init of qdisc skb head
The memory allocated by using kzallloc_node and kcalloc has been cleared. Therefore, the structure members of the new qdisc are 0. So there's no need to explicitly assign a value of 0. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Kuniyuki Iwashima | bf955b5ab8 |
net: Fix data-races around weight_p and dev_weight_[rt]x_bias.
While reading weight_p, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Also, dev_[rt]x_weight can be read/written at the same time. So, we need to use READ_ONCE() and WRITE_ONCE() for its access. Moreover, to use the same weight_p while changing dev_[rt]x_weight, we add a mutex in proc_do_dev_weight(). Fixes: |
|
Vladimir Oltean | 4873a1b202 |
net/sched: remove hacks added to dev_trans_start() for bonding to work
Now that the bonding driver keeps track of the last TX time of ARP and NS probes, we effectively revert the following commits: |
|
Jakub Kicinski | d62607c3fe |
net: rename reference+tracking helpers
Netdev reference helpers have a dev_ prefix for historic reasons. Renaming the old helpers would be too much churn but we can rename the tracking ones which are relatively recent and should be the default for new code. Rename: dev_hold_track() -> netdev_hold() dev_put_track() -> netdev_put() dev_replace_track() -> netdev_ref_replace() Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
|
Eric Dumazet | c9a40d1c87 |
net_sched: make qdisc_reset() smaller
For some unknown reason qdisc_reset() is using a convoluted way of freeing two lists of skbs. Use __skb_queue_purge() instead. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20220414011004.2378350-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
|
Eric Dumazet | 5891cd5ec4 |
net_sched: add __rcu annotation to netdev->qdisc
syzbot found a data-race [1] which lead me to add __rcu
annotations to netdev->qdisc, and proper accessors
to get LOCKDEP support.
[1]
BUG: KCSAN: data-race in dev_activate / qdisc_lookup_rcu
write to 0xffff888168ad6410 of 8 bytes by task 13559 on cpu 1:
attach_default_qdiscs net/sched/sch_generic.c:1167 [inline]
dev_activate+0x2ed/0x8f0 net/sched/sch_generic.c:1221
__dev_open+0x2e9/0x3a0 net/core/dev.c:1416
__dev_change_flags+0x167/0x3f0 net/core/dev.c:8139
rtnl_configure_link+0xc2/0x150 net/core/rtnetlink.c:3150
__rtnl_newlink net/core/rtnetlink.c:3489 [inline]
rtnl_newlink+0xf4d/0x13e0 net/core/rtnetlink.c:3529
rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5594
netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg net/socket.c:725 [inline]
____sys_sendmsg+0x39a/0x510 net/socket.c:2413
___sys_sendmsg net/socket.c:2467 [inline]
__sys_sendmsg+0x195/0x230 net/socket.c:2496
__do_sys_sendmsg net/socket.c:2505 [inline]
__se_sys_sendmsg net/socket.c:2503 [inline]
__x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff888168ad6410 of 8 bytes by task 13560 on cpu 0:
qdisc_lookup_rcu+0x30/0x2e0 net/sched/sch_api.c:323
__tcf_qdisc_find+0x74/0x3a0 net/sched/cls_api.c:1050
tc_del_tfilter+0x1c7/0x1350 net/sched/cls_api.c:2211
rtnetlink_rcv_msg+0x5ba/0x7e0 net/core/rtnetlink.c:5585
netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg net/socket.c:725 [inline]
____sys_sendmsg+0x39a/0x510 net/socket.c:2413
___sys_sendmsg net/socket.c:2467 [inline]
__sys_sendmsg+0x195/0x230 net/socket.c:2496
__do_sys_sendmsg net/socket.c:2505 [inline]
__se_sys_sendmsg net/socket.c:2503 [inline]
__x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0xffffffff85dee080 -> 0xffff88815d96ec00
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 13560 Comm: syz-executor.2 Not tainted 5.17.0-rc3-syzkaller-00116-gf1baf68e1383-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes:
|
|
Kevin Bracey | fb80445c43 |
net_sched: restore "mpu xxx" handling
commit |
|
Eric Dumazet | f12bf6f3f9 |
net: watchdog: add net device refcount tracker
Add a netdevice_tracker inside struct net_device, to track the self reference when a device has an active watchdog timer. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
|
Eric Dumazet | 606509f27f |
net/sched: add net device refcount tracker to struct Qdisc
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
|
Eric Dumazet | bec251bc8b |
net: no longer stop all TX queues in dev_watchdog()
There is no reason for stopping all TX queues from dev_watchdog() Not only this stops feeding the NIC, it also migrates all qdiscs to be serviced on the cpu calling netif_tx_unlock(), causing a potential latency artifact. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Eric Dumazet | dab8fe3207 |
net: do not inline netif_tx_lock()/netif_tx_unlock()
These are not fast path, there is no point in inlining them. Also provide netif_freeze_queues()/netif_unfreeze_queues() so that we can use them from dev_watchdog() in the following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Eric Dumazet | 5337824f4d |
net: annotate accesses to queue->trans_start
In following patches, dev_watchdog() will no longer stop all queues. It will read queue->trans_start locklessly. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Eric Dumazet | 8160fb43d5 |
net: use an atomic_long_t for queue->trans_timeout
tx_timeout_show() assumed dev_watchdog() would stop all the queues, to fetch queue->trans_timeout under protection of the queue->_xmit_lock. As we want to no longer disrupt transmits, we use an atomic_long_t instead. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: david decotigny <david.decotigny@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Seth Forshee | 85c0c3eb9a |
net: sch: simplify condtion for selecting mini_Qdisc_pair buffer
The only valid values for a miniq pointer are NULL or a pointer to miniq1 or miniq2, so testing for miniq_old != &miniq1 is functionally equivalent to testing that it is NULL or equal to &miniq2. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Seth Forshee <sforshee@digitalocean.com> Link: https://lore.kernel.org/r/20211026183721.137930-1-seth@forshee.me Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
|
Seth Forshee | 267463823a |
net: sch: eliminate unnecessary RCU waits in mini_qdisc_pair_swap()
Currently rcu_barrier() is used to ensure that no readers of the inactive mini_Qdisc buffer remain before it is reused. This waits for any pending RCU callbacks to complete, when all that is actually required is to wait for one RCU grace period to elapse after the buffer was made inactive. This means that using rcu_barrier() may result in unnecessary waits. To improve this, store the current RCU state when a buffer is made inactive and use poll_state_synchronize_rcu() to check whether a full grace period has elapsed before reusing it. If a full grace period has not elapsed, wait for a grace period to elapse, and in the non-RT case use synchronize_rcu_expedited() to hasten it. Since this approach eliminates the RCU callback it is no longer necessary to synchronize_rcu() in the tp_head==NULL case. However, the RCU state should still be saved for the previously active buffer. Before this change I would typically see mini_qdisc_pair_swap() take tens of milliseconds to complete. After this change it typcially finishes in less than 1 ms, and often it takes just a few microseconds. Thanks to Paul for walking me through the options for improving this. Cc: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Seth Forshee <sforshee@digitalocean.com> Link: https://lore.kernel.org/r/20211026130700.121189-1-seth@forshee.me Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
|
Ahmed S. Darwish | 29cbcd8582 |
net: sched: Remove Qdisc::running sequence counter
The Qdisc::running sequence counter has two uses: 1. Reliably reading qdisc's tc statistics while the qdisc is running (a seqcount read/retry loop at gnet_stats_add_basic()). 2. As a flag, indicating whether the qdisc in question is running (without any retry loops). For the first usage, the Qdisc::running sequence counter write section, qdisc_run_begin() => qdisc_run_end(), covers a much wider area than what is actually needed: the raw qdisc's bstats update. A u64_stats sync point was thus introduced (in previous commits) inside the bstats structure itself. A local u64_stats write section is then started and stopped for the bstats updates. Use that u64_stats sync point mechanism for the bstats read/retry loop at gnet_stats_add_basic(). For the second qdisc->running usage, a __QDISC_STATE_RUNNING bit flag, accessed with atomic bitops, is sufficient. Using a bit flag instead of a sequence counter at qdisc_run_begin/end() and qdisc_is_running() leads to the SMP barriers implicitly added through raw_read_seqcount() and write_seqcount_begin/end() getting removed. All call sites have been surveyed though, and no required ordering was identified. Now that the qdisc->running sequence counter is no longer used, remove it. Note, using u64_stats implies no sequence counter protection for 64-bit architectures. This can lead to the qdisc tc statistics "packets" vs. "bytes" values getting out of sync on rare occasions. The individual values will still be valid. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Ahmed S. Darwish | 50dc9a8572 |
net: sched: Merge Qdisc::bstats and Qdisc::cpu_bstats data types
The only factor differentiating per-CPU bstats data type (struct gnet_stats_basic_cpu) from the packed non-per-CPU one (struct gnet_stats_basic_packed) was a u64_stats sync point inside the former. The two data types are now equivalent: earlier commits added a u64_stats sync point to the latter. Combine both data types into "struct gnet_stats_basic_sync". This eliminates redundancy and simplifies the bstats read/write APIs. Use u64_stats_t for bstats "packets" and "bytes" data types. On 64-bit architectures, u64_stats sync points do not use sequence counter protection. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Ahmed S. Darwish | 67c9e6270f |
net: sched: Protect Qdisc::bstats with u64_stats
The not-per-CPU variant of qdisc tc (traffic control) statistics, Qdisc::gnet_stats_basic_packed bstats, is protected with Qdisc::running sequence counter. This sequence counter is used for reliably protecting bstats reads from parallel writes. Meanwhile, the seqcount's write section covers a much wider area than bstats update: qdisc_run_begin() => qdisc_run_end(). That read/write section asymmetry can lead to needless retries of the read section. To prepare for removing the Qdisc::running sequence counter altogether, introduce a u64_stats sync point inside bstats instead. Modify _bstats_update() to start/end the bstats u64_stats write section. For bisectability, and finer commits granularity, the bstats read section is still protected with a Qdisc::running read/retry loop and qdisc_run_begin/end() still starts/ends that seqcount write section. Once all call sites are modified to use _bstats_update(), the Qdisc::running seqcount will be removed and bstats read/retry loop will be modified to utilize the internal u64_stats sync point. Note, using u64_stats implies no sequence counter protection for 64-bit architectures. This can lead to the statistics "packets" vs. "bytes" values getting out of sync on rare occasions. The individual values will still be valid. [bigeasy: Minor commit message edits, init all gnet_stats_basic_packed.] Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Jakub Kicinski | f7116fb460 |
net: sched: move and reuse mq_change_real_num_tx()
The code for handling active queue changes is identical between mq and mqprio, reuse it. Suggested-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Jakub Kicinski | 1e080f1775 |
net: sched: update default qdisc visibility after Tx queue cnt changes
mq / mqprio make the default child qdiscs visible. They only do
so for the qdiscs which are within real_num_tx_queues when the
device is registered. Depending on order of calls in the driver,
or if user space changes config via ethtool -L the number of
qdiscs visible under tc qdisc show will differ from the number
of queues. This is confusing to users and potentially to system
configuration scripts which try to make sure qdiscs have the
right parameters.
Add a new Qdisc_ops callback and make relevant qdiscs TTRT.
Note that this uncovers the "shortcut" created by
commit
|
|
Yunsheng Lin | 06f5553e0f |
net: sched: fix lockdep_set_class() typo error for sch->seqlock
According to comment in qdisc_alloc(), sch->seqlock's lockdep
class key should be set to qdisc_tx_busylock, due to possible
type error, sch->busylock's lockdep class key is set to
qdisc_tx_busylock, which is duplicated because sch->busylock's
lockdep class key is already set in qdisc_alloc().
So fix it by replacing sch->busylock with sch->seqlock.
Fixes:
|
|
Yunsheng Lin | d3e0f57501 |
net: sched: remove qdisc->empty for lockless qdisc
As MISSED and DRAINING state are used to indicate a non-empty qdisc, qdisc->empty is not longer needed, so remove it. Acked-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # flexcan Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Yunsheng Lin | c4fef01ba4 |
net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc
Currently pfifo_fast has both TCQ_F_CAN_BYPASS and TCQ_F_NOLOCK
flag set, but queue discipline by-pass does not work for lockless
qdisc because skb is always enqueued to qdisc even when the qdisc
is empty, see __dev_xmit_skb().
This patch calls sch_direct_xmit() to transmit the skb directly
to the driver for empty lockless qdisc, which aviod enqueuing
and dequeuing operation.
As qdisc->empty is not reliable to indicate a empty qdisc because
there is a time window between enqueuing and setting qdisc->empty.
So we use the MISSED state added in commit
|
|
Jakub Kicinski | 490dcecabb |
mlx5: count all link events
mlx5 devices were observed generating MLX5_PORT_CHANGE_SUBTYPE_ACTIVE events without an intervening MLX5_PORT_CHANGE_SUBTYPE_DOWN. This breaks link flap detection based on Linux carrier state transition count as netif_carrier_on() does nothing if carrier is already on. Make sure we count such events. netif_carrier_event() increments the counters and fires the linkwatch events. The latter is not necessary for the use case but seems like the right thing to do. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> |
|
Yunsheng Lin | dcad9ee9e0 |
net: sched: fix tx action reschedule issue with stopped queue
The netdev qeueue might be stopped when byte queue limit has
reached or tx hw ring is full, net_tx_action() may still be
rescheduled if STATE_MISSED is set, which consumes unnecessary
cpu without dequeuing and transmiting any skb because the
netdev queue is stopped, see qdisc_run_end().
This patch fixes it by checking the netdev queue state before
calling qdisc_run() and clearing STATE_MISSED if netdev queue is
stopped during qdisc_run(), the net_tx_action() is rescheduled
again when netdev qeueue is restarted, see netif_tx_wake_queue().
As there is time window between netif_xmit_frozen_or_stopped()
checking and STATE_MISSED clearing, between which STATE_MISSED
may set by net_tx_action() scheduled by netif_tx_wake_queue(),
so set the STATE_MISSED again if netdev queue is restarted.
Fixes:
|
|
Yunsheng Lin | 102b55ee92 |
net: sched: fix tx action rescheduling issue during deactivation
Currently qdisc_run() checks the STATE_DEACTIVATED of lockless qdisc before calling __qdisc_run(), which ultimately clear the STATE_MISSED when all the skb is dequeued. If STATE_DEACTIVATED is set before clearing STATE_MISSED, there may be rescheduling of net_tx_action() at the end of qdisc_run_end(), see below: CPU0(net_tx_atcion) CPU1(__dev_xmit_skb) CPU2(dev_deactivate) . . . . set STATE_MISSED . . __netif_schedule() . . . set STATE_DEACTIVATED . . qdisc_reset() . . . .<--------------- . synchronize_net() clear __QDISC_STATE_SCHED | . . . | . . . | . some_qdisc_is_busy() . | . return *false* . | . . test STATE_DEACTIVATED | . . __qdisc_run() *not* called | . . . | . . test STATE_MISS | . . __netif_schedule()--------| . . . . . . . . __qdisc_run() is not called by net_tx_atcion() in CPU0 because CPU2 has set STATE_DEACTIVATED flag during dev_deactivate(), and STATE_MISSED is only cleared in __qdisc_run(), __netif_schedule is called at the end of qdisc_run_end(), causing tx action rescheduling problem. qdisc_run() called by net_tx_action() runs in the softirq context, which should has the same semantic as the qdisc_run() called by __dev_xmit_skb() protected by rcu_read_lock_bh(). And there is a synchronize_net() between STATE_DEACTIVATED flag being set and qdisc_reset()/some_qdisc_is_busy in dev_deactivate(), we can safely bail out for the deactived lockless qdisc in net_tx_action(), and qdisc_reset() will reset all skb not dequeued yet. So add the rcu_read_lock() explicitly to protect the qdisc_run() and do the STATE_DEACTIVATED checking in net_tx_action() before calling qdisc_run_begin(). Another option is to do the checking in the qdisc_run_end(), but it will add unnecessary overhead for non-tx_action case, because __dev_queue_xmit() will not see qdisc with STATE_DEACTIVATED after synchronize_net(), the qdisc with STATE_DEACTIVATED can only be seen by net_tx_action() because of __netif_schedule(). The STATE_DEACTIVATED checking in qdisc_run() is to avoid race between net_tx_action() and qdisc_reset(), see: commit |
|
Yunsheng Lin | a90c57f2ce |
net: sched: fix packet stuck problem for lockless qdisc
Lockless qdisc has below concurrent problem:
cpu0 cpu1
. .
q->enqueue .
. .
qdisc_run_begin() .
. .
dequeue_skb() .
. .
sch_direct_xmit() .
. .
. q->enqueue
. qdisc_run_begin()
. return and do nothing
. .
qdisc_run_end() .
cpu1 enqueue a skb without calling __qdisc_run() because cpu0
has not released the lock yet and spin_trylock() return false
for cpu1 in qdisc_run_begin(), and cpu0 do not see the skb
enqueued by cpu1 when calling dequeue_skb() because cpu1 may
enqueue the skb after cpu0 calling dequeue_skb() and before
cpu0 calling qdisc_run_end().
Lockless qdisc has below another concurrent problem when
tx_action is involved:
cpu0(serving tx_action) cpu1 cpu2
. . .
. q->enqueue .
. qdisc_run_begin() .
. dequeue_skb() .
. . q->enqueue
. . .
. sch_direct_xmit() .
. . qdisc_run_begin()
. . return and do nothing
. . .
clear __QDISC_STATE_SCHED . .
qdisc_run_begin() . .
return and do nothing . .
. . .
. qdisc_run_end() .
This patch fixes the above data race by:
1. If the first spin_trylock() return false and STATE_MISSED is
not set, set STATE_MISSED and retry another spin_trylock() in
case other CPU may not see STATE_MISSED after it releases the
lock.
2. reschedule if STATE_MISSED is set after the lock is released
at the end of qdisc_run_end().
For tx_action case, STATE_MISSED is also set when cpu1 is at the
end if qdisc_run_end(), so tx_action will be rescheduled again
to dequeue the skb enqueued by cpu2.
Clear STATE_MISSED before retrying a dequeuing when dequeuing
returns NULL in order to reduce the overhead of the second
spin_trylock() and __netif_schedule() calling.
Also clear the STATE_MISSED before calling __netif_schedule()
at the end of qdisc_run_end() to avoid doing another round of
dequeuing in the pfifo_fast_dequeue().
The performance impact of this patch, tested using pktgen and
dummy netdev with pfifo_fast qdisc attached:
threads without+this_patch with+this_patch delta
1 2.61Mpps 2.60Mpps -0.3%
2 3.97Mpps 3.82Mpps -3.7%
4 5.62Mpps 5.59Mpps -0.5%
8 2.78Mpps 2.77Mpps -0.3%
16 2.22Mpps 2.22Mpps -0.0%
Fixes:
|
|
Baowen Zheng | 2ffe039528 |
net/sched: act_police: add support for packet-per-second policing
Allow a policer action to enforce a rate-limit based on packets-per-second, configurable using a packet-per-second rate and burst parameters. e.g. tc filter add dev tap1 parent ffff: u32 match \ u32 0 0 police pkts_rate 3000 pkts_burst 1000 Testing was unable to uncover a performance impact of this change on existing features. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Louis Peens <louis.peens@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Eric Dumazet | 846e463a70 |
net/sched: get rid of qdisc->padded
kmalloc() of sufficiently big portion of memory is cache-aligned in regular conditions. If some debugging options are used, there is no reason qdisc structures would need 64-byte alignment if most other kernel structures are not aligned. This get rid of QDISC_ALIGN and QDISC_ALIGNTO. Addition of privdata field will help implementing the reverse of qdisc_priv() and documents where the private data is. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Allen Pais <allen.lkml@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
|
Yunsheng Lin | 2fb541c862 |
net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc
Currently there is concurrent reset and enqueue operation for the
same lockless qdisc when there is no lock to synchronize the
q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in
qdisc_deactivate() called by dev_deactivate_queue(), which may cause
out-of-bounds access for priv->ring[] in hns3 driver if user has
requested a smaller queue num when __dev_xmit_skb() still enqueue a
skb with a larger queue_mapping after the corresponding qdisc is
reset, and call hns3_nic_net_xmit() with that skb later.
Reused the existing synchronize_net() in dev_deactivate_many() to
make sure skb with larger queue_mapping enqueued to old qdisc(which
is saved in dev_queue->qdisc_sleeping) will always be reset when
dev_reset_queue() is called.
Fixes:
|
|
Valentin Longchamp | 1a3db27ad9 |
net: sched: export __netdev_watchdog_up()
Since the quiesce/activate rework, __netdev_watchdog_up() is directly
called in the ucc_geth driver.
Unfortunately, this function is not available for modules and thus
ucc_geth cannot be built as a module anymore. Fix it by exporting
__netdev_watchdog_up().
Since the commit introducing the regression was backported to stable
branches, this one should ideally be as well.
Fixes:
|
|
Cong Wang | 759ae57f1b |
net_sched: get rid of unnecessary dev_qdisc_reset()
Resetting old qdisc on dev_queue->qdisc_sleeping in dev_qdisc_reset() is redundant, because this qdisc, even if not same with dev_queue->qdisc, is reset via qdisc_put() right after calling dev_graft_qdisc() when hitting refcnt 0. This is very easy to observe with qdisc_reset() tracepoint and stack traces. Reported-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz> Tested-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Cong Wang | 70f5096533 |
net_sched: avoid resetting active qdisc for multiple times
Except for sch_mq and sch_mqprio, each dev queue points to the same root qdisc, so when we reset the dev queues with netdev_for_each_tx_queue() we end up resetting the same instance of the root qdisc for multiple times. Avoid this by checking the __QDISC_STATE_DEACTIVATED bit in each iteration, so for sch_mq/sch_mqprio, we still reset all of them like before, for the rest, we only reset it once. Reported-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz> Tested-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Cong Wang | f5a7833e83 |
net_sched: add a tracepoint for qdisc creation
With this tracepoint, we could know when qdisc's are created, especially those default qdisc's. Sample output: tc-736 [001] ...1 56.230107: qdisc_create: dev=ens3 kind=pfifo parent=1:0 tc-736 [001] ...1 56.230113: qdisc_create: dev=ens3 kind=hfsc parent=ffff:ffff tc-738 [001] ...1 56.256816: qdisc_create: dev=ens3 kind=pfifo parent=1:100 tc-739 [001] ...1 56.267584: qdisc_create: dev=ens3 kind=pfifo parent=1:200 tc-740 [001] ...1 56.279649: qdisc_create: dev=ens3 kind=fq_codel parent=1:100 tc-741 [001] ...1 56.289996: qdisc_create: dev=ens3 kind=pfifo_fast parent=1:200 tc-745 [000] .N.1 111.687483: qdisc_create: dev=ens3 kind=ingress parent=ffff:fff1 Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Cong Wang | a34dac0b90 |
net_sched: add tracepoints for qdisc_reset() and qdisc_destroy()
Add two tracepoints for qdisc_reset() and qdisc_destroy() to track qdisc resetting and destroying. Sample output: tc-756 [000] ...3 138.355662: qdisc_reset: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0 tc-756 [000] ...1 138.355720: qdisc_reset: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0 tc-756 [000] ...1 138.355867: qdisc_reset: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0 tc-756 [000] ...1 138.355930: qdisc_destroy: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0 tc-757 [000] ...2 143.073780: qdisc_reset: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0 tc-757 [000] ...1 143.073878: qdisc_reset: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0 tc-757 [000] ...1 143.074114: qdisc_reset: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0 tc-757 [000] ...1 143.074228: qdisc_destroy: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0 Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Cong Wang | 4909daba37 |
net_sched: use qdisc_reset() in qdisc_destroy()
qdisc_destroy() calls ops->reset() and cleans up qdisc->gso_skb and qdisc->skb_bad_txq, these are nearly same with qdisc_reset(), so just call it directly, and cosolidate the code for the next patch. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Cong Wang | 1a33e10e4a |
net: partially revert dynamic lockdep key changes
This patch reverts the folowing commits: commit |
|
Jesper Dangaard Brouer | bf6dba76d2 |
net: sched: fallback to qdisc noqueue if default qdisc setup fail
Currently if the default qdisc setup/init fails, the device ends up with qdisc "noop", which causes all TX packets to get dropped. With the introduction of sysctl net/core/default_qdisc it is possible to change the default qdisc to be more advanced, which opens for the possibility that Qdisc_ops->init() can fail. This patch detect these kind of failures, and choose to fallback to qdisc "noqueue", which is so simple that its init call will not fail. This allows the interface to continue functioning. V2: As this also captures memory failures, which are transient, the device is not kept in IFF_NO_QUEUE state. This allows the net_device to retry to default qdisc assignment. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Paul Blakey | 7d17c544cd |
net: sched: Pass ingress block to tcf_classify_ingress
On ingress and cls_act qdiscs init, save the block on ingress mini_Qdisc and and pass it on to ingress classification, so it can be used for the looking up a specified chain index. Co-developed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> |
|
Michael S. Tsirkin | 0290bd291c |
netdev: pass the stuck queue to the timeout handler
This allows incrementing the correct timeout statistic without any mess. Down the road, devices can learn to reset just the specific queue. The patch was generated with the following script: use strict; use warnings; our $^I = '.bak'; my @work = ( ["arch/m68k/emu/nfeth.c", "nfeth_tx_timeout"], ["arch/um/drivers/net_kern.c", "uml_net_tx_timeout"], ["arch/um/drivers/vector_kern.c", "vector_net_tx_timeout"], ["arch/xtensa/platforms/iss/network.c", "iss_net_tx_timeout"], ["drivers/char/pcmcia/synclink_cs.c", "hdlcdev_tx_timeout"], ["drivers/infiniband/ulp/ipoib/ipoib_main.c", "ipoib_timeout"], ["drivers/infiniband/ulp/ipoib/ipoib_main.c", "ipoib_timeout"], ["drivers/message/fusion/mptlan.c", "mpt_lan_tx_timeout"], ["drivers/misc/sgi-xp/xpnet.c", "xpnet_dev_tx_timeout"], ["drivers/net/appletalk/cops.c", "cops_timeout"], ["drivers/net/arcnet/arcdevice.h", "arcnet_timeout"], ["drivers/net/arcnet/arcnet.c", "arcnet_timeout"], ["drivers/net/arcnet/com20020.c", "arcnet_timeout"], ["drivers/net/ethernet/3com/3c509.c", "el3_tx_timeout"], ["drivers/net/ethernet/3com/3c515.c", "corkscrew_timeout"], ["drivers/net/ethernet/3com/3c574_cs.c", "el3_tx_timeout"], ["drivers/net/ethernet/3com/3c589_cs.c", "el3_tx_timeout"], ["drivers/net/ethernet/3com/3c59x.c", "vortex_tx_timeout"], ["drivers/net/ethernet/3com/3c59x.c", "vortex_tx_timeout"], ["drivers/net/ethernet/3com/typhoon.c", "typhoon_tx_timeout"], ["drivers/net/ethernet/8390/8390.h", "ei_tx_timeout"], ["drivers/net/ethernet/8390/8390.h", "eip_tx_timeout"], ["drivers/net/ethernet/8390/8390.c", "ei_tx_timeout"], ["drivers/net/ethernet/8390/8390p.c", "eip_tx_timeout"], ["drivers/net/ethernet/8390/ax88796.c", "ax_ei_tx_timeout"], ["drivers/net/ethernet/8390/axnet_cs.c", "axnet_tx_timeout"], ["drivers/net/ethernet/8390/etherh.c", "__ei_tx_timeout"], ["drivers/net/ethernet/8390/hydra.c", "__ei_tx_timeout"], ["drivers/net/ethernet/8390/mac8390.c", "__ei_tx_timeout"], ["drivers/net/ethernet/8390/mcf8390.c", "__ei_tx_timeout"], ["drivers/net/ethernet/8390/lib8390.c", "__ei_tx_timeout"], ["drivers/net/ethernet/8390/ne2k-pci.c", "ei_tx_timeout"], ["drivers/net/ethernet/8390/pcnet_cs.c", "ei_tx_timeout"], ["drivers/net/ethernet/8390/smc-ultra.c", "ei_tx_timeout"], ["drivers/net/ethernet/8390/wd.c", "ei_tx_timeout"], ["drivers/net/ethernet/8390/zorro8390.c", "__ei_tx_timeout"], ["drivers/net/ethernet/adaptec/starfire.c", "tx_timeout"], ["drivers/net/ethernet/agere/et131x.c", "et131x_tx_timeout"], ["drivers/net/ethernet/allwinner/sun4i-emac.c", "emac_timeout"], ["drivers/net/ethernet/alteon/acenic.c", "ace_watchdog"], ["drivers/net/ethernet/amazon/ena/ena_netdev.c", "ena_tx_timeout"], ["drivers/net/ethernet/amd/7990.h", "lance_tx_timeout"], ["drivers/net/ethernet/amd/7990.c", "lance_tx_timeout"], ["drivers/net/ethernet/amd/a2065.c", "lance_tx_timeout"], ["drivers/net/ethernet/amd/am79c961a.c", "am79c961_timeout"], ["drivers/net/ethernet/amd/amd8111e.c", "amd8111e_tx_timeout"], ["drivers/net/ethernet/amd/ariadne.c", "ariadne_tx_timeout"], ["drivers/net/ethernet/amd/atarilance.c", "lance_tx_timeout"], ["drivers/net/ethernet/amd/au1000_eth.c", "au1000_tx_timeout"], ["drivers/net/ethernet/amd/declance.c", "lance_tx_timeout"], ["drivers/net/ethernet/amd/lance.c", "lance_tx_timeout"], ["drivers/net/ethernet/amd/mvme147.c", "lance_tx_timeout"], ["drivers/net/ethernet/amd/ni65.c", "ni65_timeout"], ["drivers/net/ethernet/amd/nmclan_cs.c", "mace_tx_timeout"], ["drivers/net/ethernet/amd/pcnet32.c", "pcnet32_tx_timeout"], ["drivers/net/ethernet/amd/sunlance.c", "lance_tx_timeout"], ["drivers/net/ethernet/amd/xgbe/xgbe-drv.c", "xgbe_tx_timeout"], ["drivers/net/ethernet/apm/xgene-v2/main.c", "xge_timeout"], ["drivers/net/ethernet/apm/xgene/xgene_enet_main.c", "xgene_enet_timeout"], ["drivers/net/ethernet/apple/macmace.c", "mace_tx_timeout"], ["drivers/net/ethernet/atheros/ag71xx.c", "ag71xx_tx_timeout"], ["drivers/net/ethernet/atheros/alx/main.c", "alx_tx_timeout"], ["drivers/net/ethernet/atheros/atl1c/atl1c_main.c", "atl1c_tx_timeout"], ["drivers/net/ethernet/atheros/atl1e/atl1e_main.c", "atl1e_tx_timeout"], ["drivers/net/ethernet/atheros/atlx/atl.c", "atlx_tx_timeout"], ["drivers/net/ethernet/atheros/atlx/atl1.c", "atlx_tx_timeout"], ["drivers/net/ethernet/atheros/atlx/atl2.c", "atl2_tx_timeout"], ["drivers/net/ethernet/broadcom/b44.c", "b44_tx_timeout"], ["drivers/net/ethernet/broadcom/bcmsysport.c", "bcm_sysport_tx_timeout"], ["drivers/net/ethernet/broadcom/bnx2.c", "bnx2_tx_timeout"], ["drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h", "bnx2x_tx_timeout"], ["drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c", "bnx2x_tx_timeout"], ["drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c", "bnx2x_tx_timeout"], ["drivers/net/ethernet/broadcom/bnxt/bnxt.c", "bnxt_tx_timeout"], ["drivers/net/ethernet/broadcom/genet/bcmgenet.c", "bcmgenet_timeout"], ["drivers/net/ethernet/broadcom/sb1250-mac.c", "sbmac_tx_timeout"], ["drivers/net/ethernet/broadcom/tg3.c", "tg3_tx_timeout"], ["drivers/net/ethernet/calxeda/xgmac.c", "xgmac_tx_timeout"], ["drivers/net/ethernet/cavium/liquidio/lio_main.c", "liquidio_tx_timeout"], ["drivers/net/ethernet/cavium/liquidio/lio_vf_main.c", "liquidio_tx_timeout"], ["drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c", "lio_vf_rep_tx_timeout"], ["drivers/net/ethernet/cavium/thunder/nicvf_main.c", "nicvf_tx_timeout"], ["drivers/net/ethernet/cirrus/cs89x0.c", "net_timeout"], ["drivers/net/ethernet/cisco/enic/enic_main.c", "enic_tx_timeout"], ["drivers/net/ethernet/cisco/enic/enic_main.c", "enic_tx_timeout"], ["drivers/net/ethernet/cortina/gemini.c", "gmac_tx_timeout"], ["drivers/net/ethernet/davicom/dm9000.c", "dm9000_timeout"], ["drivers/net/ethernet/dec/tulip/de2104x.c", "de_tx_timeout"], ["drivers/net/ethernet/dec/tulip/tulip_core.c", "tulip_tx_timeout"], ["drivers/net/ethernet/dec/tulip/winbond-840.c", "tx_timeout"], ["drivers/net/ethernet/dlink/dl2k.c", "rio_tx_timeout"], ["drivers/net/ethernet/dlink/sundance.c", "tx_timeout"], ["drivers/net/ethernet/emulex/benet/be_main.c", "be_tx_timeout"], ["drivers/net/ethernet/ethoc.c", "ethoc_tx_timeout"], ["drivers/net/ethernet/faraday/ftgmac100.c", "ftgmac100_tx_timeout"], ["drivers/net/ethernet/fealnx.c", "fealnx_tx_timeout"], ["drivers/net/ethernet/freescale/dpaa/dpaa_eth.c", "dpaa_tx_timeout"], ["drivers/net/ethernet/freescale/fec_main.c", "fec_timeout"], ["drivers/net/ethernet/freescale/fec_mpc52xx.c", "mpc52xx_fec_tx_timeout"], ["drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c", "fs_timeout"], ["drivers/net/ethernet/freescale/gianfar.c", "gfar_timeout"], ["drivers/net/ethernet/freescale/ucc_geth.c", "ucc_geth_timeout"], ["drivers/net/ethernet/fujitsu/fmvj18x_cs.c", "fjn_tx_timeout"], ["drivers/net/ethernet/google/gve/gve_main.c", "gve_tx_timeout"], ["drivers/net/ethernet/hisilicon/hip04_eth.c", "hip04_timeout"], ["drivers/net/ethernet/hisilicon/hix5hd2_gmac.c", "hix5hd2_net_timeout"], ["drivers/net/ethernet/hisilicon/hns/hns_enet.c", "hns_nic_net_timeout"], ["drivers/net/ethernet/hisilicon/hns3/hns3_enet.c", "hns3_nic_net_timeout"], ["drivers/net/ethernet/huawei/hinic/hinic_main.c", "hinic_tx_timeout"], ["drivers/net/ethernet/i825xx/82596.c", "i596_tx_timeout"], ["drivers/net/ethernet/i825xx/ether1.c", "ether1_timeout"], ["drivers/net/ethernet/i825xx/lib82596.c", "i596_tx_timeout"], ["drivers/net/ethernet/i825xx/sun3_82586.c", "sun3_82586_timeout"], ["drivers/net/ethernet/ibm/ehea/ehea_main.c", "ehea_tx_watchdog"], ["drivers/net/ethernet/ibm/emac/core.c", "emac_tx_timeout"], ["drivers/net/ethernet/ibm/emac/core.c", "emac_tx_timeout"], ["drivers/net/ethernet/ibm/ibmvnic.c", "ibmvnic_tx_timeout"], ["drivers/net/ethernet/intel/e100.c", "e100_tx_timeout"], ["drivers/net/ethernet/intel/e1000/e1000_main.c", "e1000_tx_timeout"], ["drivers/net/ethernet/intel/e1000e/netdev.c", "e1000_tx_timeout"], ["drivers/net/ethernet/intel/fm10k/fm10k_netdev.c", "fm10k_tx_timeout"], ["drivers/net/ethernet/intel/i40e/i40e_main.c", "i40e_tx_timeout"], ["drivers/net/ethernet/intel/iavf/iavf_main.c", "iavf_tx_timeout"], ["drivers/net/ethernet/intel/ice/ice_main.c", "ice_tx_timeout"], ["drivers/net/ethernet/intel/ice/ice_main.c", "ice_tx_timeout"], ["drivers/net/ethernet/intel/igb/igb_main.c", "igb_tx_timeout"], ["drivers/net/ethernet/intel/igbvf/netdev.c", "igbvf_tx_timeout"], ["drivers/net/ethernet/intel/ixgb/ixgb_main.c", "ixgb_tx_timeout"], ["drivers/net/ethernet/intel/ixgbe/ixgbe_debugfs.c", "adapter->netdev->netdev_ops->ndo_tx_timeout(adapter->netdev);"], ["drivers/net/ethernet/intel/ixgbe/ixgbe_main.c", "ixgbe_tx_timeout"], ["drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c", "ixgbevf_tx_timeout"], ["drivers/net/ethernet/jme.c", "jme_tx_timeout"], ["drivers/net/ethernet/korina.c", "korina_tx_timeout"], ["drivers/net/ethernet/lantiq_etop.c", "ltq_etop_tx_timeout"], ["drivers/net/ethernet/marvell/mv643xx_eth.c", "mv643xx_eth_tx_timeout"], ["drivers/net/ethernet/marvell/pxa168_eth.c", "pxa168_eth_tx_timeout"], ["drivers/net/ethernet/marvell/skge.c", "skge_tx_timeout"], ["drivers/net/ethernet/marvell/sky2.c", "sky2_tx_timeout"], ["drivers/net/ethernet/marvell/sky2.c", "sky2_tx_timeout"], ["drivers/net/ethernet/mediatek/mtk_eth_soc.c", "mtk_tx_timeout"], ["drivers/net/ethernet/mellanox/mlx4/en_netdev.c", "mlx4_en_tx_timeout"], ["drivers/net/ethernet/mellanox/mlx4/en_netdev.c", "mlx4_en_tx_timeout"], ["drivers/net/ethernet/mellanox/mlx5/core/en_main.c", "mlx5e_tx_timeout"], ["drivers/net/ethernet/micrel/ks8842.c", "ks8842_tx_timeout"], ["drivers/net/ethernet/micrel/ksz884x.c", "netdev_tx_timeout"], ["drivers/net/ethernet/microchip/enc28j60.c", "enc28j60_tx_timeout"], ["drivers/net/ethernet/microchip/encx24j600.c", "encx24j600_tx_timeout"], ["drivers/net/ethernet/natsemi/sonic.h", "sonic_tx_timeout"], ["drivers/net/ethernet/natsemi/sonic.c", "sonic_tx_timeout"], ["drivers/net/ethernet/natsemi/jazzsonic.c", "sonic_tx_timeout"], ["drivers/net/ethernet/natsemi/macsonic.c", "sonic_tx_timeout"], ["drivers/net/ethernet/natsemi/natsemi.c", "ns_tx_timeout"], ["drivers/net/ethernet/natsemi/ns83820.c", "ns83820_tx_timeout"], ["drivers/net/ethernet/natsemi/xtsonic.c", "sonic_tx_timeout"], ["drivers/net/ethernet/neterion/s2io.h", "s2io_tx_watchdog"], ["drivers/net/ethernet/neterion/s2io.c", "s2io_tx_watchdog"], ["drivers/net/ethernet/neterion/vxge/vxge-main.c", "vxge_tx_watchdog"], ["drivers/net/ethernet/netronome/nfp/nfp_net_common.c", "nfp_net_tx_timeout"], ["drivers/net/ethernet/nvidia/forcedeth.c", "nv_tx_timeout"], ["drivers/net/ethernet/nvidia/forcedeth.c", "nv_tx_timeout"], ["drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c", "pch_gbe_tx_timeout"], ["drivers/net/ethernet/packetengines/hamachi.c", "hamachi_tx_timeout"], ["drivers/net/ethernet/packetengines/yellowfin.c", "yellowfin_tx_timeout"], ["drivers/net/ethernet/pensando/ionic/ionic_lif.c", "ionic_tx_timeout"], ["drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c", "netxen_tx_timeout"], ["drivers/net/ethernet/qlogic/qla3xxx.c", "ql3xxx_tx_timeout"], ["drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c", "qlcnic_tx_timeout"], ["drivers/net/ethernet/qualcomm/emac/emac.c", "emac_tx_timeout"], ["drivers/net/ethernet/qualcomm/qca_spi.c", "qcaspi_netdev_tx_timeout"], ["drivers/net/ethernet/qualcomm/qca_uart.c", "qcauart_netdev_tx_timeout"], ["drivers/net/ethernet/rdc/r6040.c", "r6040_tx_timeout"], ["drivers/net/ethernet/realtek/8139cp.c", "cp_tx_timeout"], ["drivers/net/ethernet/realtek/8139too.c", "rtl8139_tx_timeout"], ["drivers/net/ethernet/realtek/atp.c", "tx_timeout"], ["drivers/net/ethernet/realtek/r8169_main.c", "rtl8169_tx_timeout"], ["drivers/net/ethernet/renesas/ravb_main.c", "ravb_tx_timeout"], ["drivers/net/ethernet/renesas/sh_eth.c", "sh_eth_tx_timeout"], ["drivers/net/ethernet/renesas/sh_eth.c", "sh_eth_tx_timeout"], ["drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c", "sxgbe_tx_timeout"], ["drivers/net/ethernet/seeq/ether3.c", "ether3_timeout"], ["drivers/net/ethernet/seeq/sgiseeq.c", "timeout"], ["drivers/net/ethernet/sfc/efx.c", "efx_watchdog"], ["drivers/net/ethernet/sfc/falcon/efx.c", "ef4_watchdog"], ["drivers/net/ethernet/sgi/ioc3-eth.c", "ioc3_timeout"], ["drivers/net/ethernet/sgi/meth.c", "meth_tx_timeout"], ["drivers/net/ethernet/silan/sc92031.c", "sc92031_tx_timeout"], ["drivers/net/ethernet/sis/sis190.c", "sis190_tx_timeout"], ["drivers/net/ethernet/sis/sis900.c", "sis900_tx_timeout"], ["drivers/net/ethernet/smsc/epic100.c", "epic_tx_timeout"], ["drivers/net/ethernet/smsc/smc911x.c", "smc911x_timeout"], ["drivers/net/ethernet/smsc/smc9194.c", "smc_timeout"], ["drivers/net/ethernet/smsc/smc91c92_cs.c", "smc_tx_timeout"], ["drivers/net/ethernet/smsc/smc91x.c", "smc_timeout"], ["drivers/net/ethernet/stmicro/stmmac/stmmac_main.c", "stmmac_tx_timeout"], ["drivers/net/ethernet/sun/cassini.c", "cas_tx_timeout"], ["drivers/net/ethernet/sun/ldmvsw.c", "sunvnet_tx_timeout_common"], ["drivers/net/ethernet/sun/niu.c", "niu_tx_timeout"], ["drivers/net/ethernet/sun/sunbmac.c", "bigmac_tx_timeout"], ["drivers/net/ethernet/sun/sungem.c", "gem_tx_timeout"], ["drivers/net/ethernet/sun/sunhme.c", "happy_meal_tx_timeout"], ["drivers/net/ethernet/sun/sunqe.c", "qe_tx_timeout"], ["drivers/net/ethernet/sun/sunvnet.c", "sunvnet_tx_timeout_common"], ["drivers/net/ethernet/sun/sunvnet_common.c", "sunvnet_tx_timeout_common"], ["drivers/net/ethernet/sun/sunvnet_common.h", "sunvnet_tx_timeout_common"], ["drivers/net/ethernet/synopsys/dwc-xlgmac-net.c", "xlgmac_tx_timeout"], ["drivers/net/ethernet/ti/cpmac.c", "cpmac_tx_timeout"], ["drivers/net/ethernet/ti/cpsw.c", "cpsw_ndo_tx_timeout"], ["drivers/net/ethernet/ti/cpsw_priv.c", "cpsw_ndo_tx_timeout"], ["drivers/net/ethernet/ti/cpsw_priv.h", "cpsw_ndo_tx_timeout"], ["drivers/net/ethernet/ti/davinci_emac.c", "emac_dev_tx_timeout"], ["drivers/net/ethernet/ti/netcp_core.c", "netcp_ndo_tx_timeout"], ["drivers/net/ethernet/ti/tlan.c", "tlan_tx_timeout"], ["drivers/net/ethernet/toshiba/ps3_gelic_net.h", "gelic_net_tx_timeout"], ["drivers/net/ethernet/toshiba/ps3_gelic_net.c", "gelic_net_tx_timeout"], ["drivers/net/ethernet/toshiba/ps3_gelic_wireless.c", "gelic_net_tx_timeout"], ["drivers/net/ethernet/toshiba/spider_net.c", "spider_net_tx_timeout"], ["drivers/net/ethernet/toshiba/tc35815.c", "tc35815_tx_timeout"], ["drivers/net/ethernet/via/via-rhine.c", "rhine_tx_timeout"], ["drivers/net/ethernet/wiznet/w5100.c", "w5100_tx_timeout"], ["drivers/net/ethernet/wiznet/w5300.c", "w5300_tx_timeout"], ["drivers/net/ethernet/xilinx/xilinx_emaclite.c", "xemaclite_tx_timeout"], ["drivers/net/ethernet/xircom/xirc2ps_cs.c", "xirc_tx_timeout"], ["drivers/net/fjes/fjes_main.c", "fjes_tx_retry"], ["drivers/net/slip/slip.c", "sl_tx_timeout"], ["include/linux/usb/usbnet.h", "usbnet_tx_timeout"], ["drivers/net/usb/aqc111.c", "usbnet_tx_timeout"], ["drivers/net/usb/asix_devices.c", "usbnet_tx_timeout"], ["drivers/net/usb/asix_devices.c", "usbnet_tx_timeout"], ["drivers/net/usb/asix_devices.c", "usbnet_tx_timeout"], ["drivers/net/usb/ax88172a.c", "usbnet_tx_timeout"], ["drivers/net/usb/ax88179_178a.c", "usbnet_tx_timeout"], ["drivers/net/usb/catc.c", "catc_tx_timeout"], ["drivers/net/usb/cdc_mbim.c", "usbnet_tx_timeout"], ["drivers/net/usb/cdc_ncm.c", "usbnet_tx_timeout"], ["drivers/net/usb/dm9601.c", "usbnet_tx_timeout"], ["drivers/net/usb/hso.c", "hso_net_tx_timeout"], ["drivers/net/usb/int51x1.c", "usbnet_tx_timeout"], ["drivers/net/usb/ipheth.c", "ipheth_tx_timeout"], ["drivers/net/usb/kaweth.c", "kaweth_tx_timeout"], ["drivers/net/usb/lan78xx.c", "lan78xx_tx_timeout"], ["drivers/net/usb/mcs7830.c", "usbnet_tx_timeout"], ["drivers/net/usb/pegasus.c", "pegasus_tx_timeout"], ["drivers/net/usb/qmi_wwan.c", "usbnet_tx_timeout"], ["drivers/net/usb/r8152.c", "rtl8152_tx_timeout"], ["drivers/net/usb/rndis_host.c", "usbnet_tx_timeout"], ["drivers/net/usb/rtl8150.c", "rtl8150_tx_timeout"], ["drivers/net/usb/sierra_net.c", "usbnet_tx_timeout"], ["drivers/net/usb/smsc75xx.c", "usbnet_tx_timeout"], ["drivers/net/usb/smsc95xx.c", "usbnet_tx_timeout"], ["drivers/net/usb/sr9700.c", "usbnet_tx_timeout"], ["drivers/net/usb/sr9800.c", "usbnet_tx_timeout"], ["drivers/net/usb/usbnet.c", "usbnet_tx_timeout"], ["drivers/net/vmxnet3/vmxnet3_drv.c", "vmxnet3_tx_timeout"], ["drivers/net/wan/cosa.c", "cosa_net_timeout"], ["drivers/net/wan/farsync.c", "fst_tx_timeout"], ["drivers/net/wan/fsl_ucc_hdlc.c", "uhdlc_tx_timeout"], ["drivers/net/wan/lmc/lmc_main.c", "lmc_driver_timeout"], ["drivers/net/wan/x25_asy.c", "x25_asy_timeout"], ["drivers/net/wimax/i2400m/netdev.c", "i2400m_tx_timeout"], ["drivers/net/wireless/intel/ipw2x00/ipw2100.c", "ipw2100_tx_timeout"], ["drivers/net/wireless/intersil/hostap/hostap_main.c", "prism2_tx_timeout"], ["drivers/net/wireless/intersil/hostap/hostap_main.c", "prism2_tx_timeout"], ["drivers/net/wireless/intersil/hostap/hostap_main.c", "prism2_tx_timeout"], ["drivers/net/wireless/intersil/orinoco/main.c", "orinoco_tx_timeout"], ["drivers/net/wireless/intersil/orinoco/orinoco_usb.c", "orinoco_tx_timeout"], ["drivers/net/wireless/intersil/orinoco/orinoco.h", "orinoco_tx_timeout"], ["drivers/net/wireless/intersil/prism54/islpci_dev.c", "islpci_eth_tx_timeout"], ["drivers/net/wireless/intersil/prism54/islpci_eth.c", "islpci_eth_tx_timeout"], ["drivers/net/wireless/intersil/prism54/islpci_eth.h", "islpci_eth_tx_timeout"], ["drivers/net/wireless/marvell/mwifiex/main.c", "mwifiex_tx_timeout"], ["drivers/net/wireless/quantenna/qtnfmac/core.c", "qtnf_netdev_tx_timeout"], ["drivers/net/wireless/quantenna/qtnfmac/core.h", "qtnf_netdev_tx_timeout"], ["drivers/net/wireless/rndis_wlan.c", "usbnet_tx_timeout"], ["drivers/net/wireless/wl3501_cs.c", "wl3501_tx_timeout"], ["drivers/net/wireless/zydas/zd1201.c", "zd1201_tx_timeout"], ["drivers/s390/net/qeth_core.h", "qeth_tx_timeout"], ["drivers/s390/net/qeth_core_main.c", "qeth_tx_timeout"], ["drivers/s390/net/qeth_l2_main.c", "qeth_tx_timeout"], ["drivers/s390/net/qeth_l2_main.c", "qeth_tx_timeout"], ["drivers/s390/net/qeth_l3_main.c", "qeth_tx_timeout"], ["drivers/s390/net/qeth_l3_main.c", "qeth_tx_timeout"], ["drivers/staging/ks7010/ks_wlan_net.c", "ks_wlan_tx_timeout"], ["drivers/staging/qlge/qlge_main.c", "qlge_tx_timeout"], ["drivers/staging/rtl8192e/rtl8192e/rtl_core.c", "_rtl92e_tx_timeout"], ["drivers/staging/rtl8192u/r8192U_core.c", "tx_timeout"], ["drivers/staging/unisys/visornic/visornic_main.c", "visornic_xmit_timeout"], ["drivers/staging/wlan-ng/p80211netdev.c", "p80211knetdev_tx_timeout"], ["drivers/tty/n_gsm.c", "gsm_mux_net_tx_timeout"], ["drivers/tty/synclink.c", "hdlcdev_tx_timeout"], ["drivers/tty/synclink_gt.c", "hdlcdev_tx_timeout"], ["drivers/tty/synclinkmp.c", "hdlcdev_tx_timeout"], ["net/atm/lec.c", "lec_tx_timeout"], ["net/bluetooth/bnep/netdev.c", "bnep_net_timeout"] ); for my $p (@work) { my @pair = @$p; my $file = $pair[0]; my $func = $pair[1]; print STDERR $file , ": ", $func,"\n"; our @ARGV = ($file); while (<ARGV>) { if (m/($func\s*\(struct\s+net_device\s+\*[A-Za-z_]?[A-Za-z-0-9_]*)(\))/) { print STDERR "found $1+$2 in $file\n"; } if (s/($func\s*\(struct\s+net_device\s+\*[A-Za-z_]?[A-Za-z-0-9_]*)(\))/$1, unsigned int txqueue$2/) { print STDERR "$func found in $file\n"; } print; } } where the list of files and functions is simply from: git grep ndo_tx_timeout, with manual addition of headers in the rare cases where the function is from a header, then manually changing the few places which actually call ndo_tx_timeout. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Martin Habets <mhabets@solarflare.com> changes from v9: fixup a forward declaration changes from v9: more leftovers from v3 change changes from v8: fix up a missing direct call to timeout rebased on net-next changes from v7: fixup leftovers from v3 change changes from v6: fix typo in rtl driver changes from v5: add missing files (allow any net device argument name) changes from v4: add a missing driver header changes from v3: change queue # to unsigned Changes from v2: added headers Changes from v1: Fix errors found by kbuild: generalize the pattern a bit, to pick up a couple of instances missed by the previous version. Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Eric Dumazet | 90b2be27bb |
net/sched: annotate lockless accesses to qdisc->empty
KCSAN reported the following race [1]
BUG: KCSAN: data-race in __dev_queue_xmit / net_tx_action
read to 0xffff8880ba403508 of 1 bytes by task 21814 on cpu 1:
__dev_xmit_skb net/core/dev.c:3389 [inline]
__dev_queue_xmit+0x9db/0x1b40 net/core/dev.c:3761
dev_queue_xmit+0x21/0x30 net/core/dev.c:3825
neigh_hh_output include/net/neighbour.h:500 [inline]
neigh_output include/net/neighbour.h:509 [inline]
ip6_finish_output2+0x873/0xec0 net/ipv6/ip6_output.c:116
__ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
__ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
dst_output include/net/dst.h:436 [inline]
ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
ip6_send_skb+0x53/0x110 net/ipv6/ip6_output.c:1795
udp_v6_send_skb.isra.0+0x3ec/0xa70 net/ipv6/udp.c:1173
udpv6_sendmsg+0x1906/0x1c20 net/ipv6/udp.c:1471
inet6_sendmsg+0x6d/0x90 net/ipv6/af_inet6.c:576
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0x9f/0xc0 net/socket.c:657
___sys_sendmsg+0x2b7/0x5d0 net/socket.c:2311
__sys_sendmmsg+0x123/0x350 net/socket.c:2413
__do_sys_sendmmsg net/socket.c:2442 [inline]
__se_sys_sendmmsg net/socket.c:2439 [inline]
__x64_sys_sendmmsg+0x64/0x80 net/socket.c:2439
do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x44/0xa9
write to 0xffff8880ba403508 of 1 bytes by interrupt on cpu 0:
qdisc_run_begin include/net/sch_generic.h:160 [inline]
qdisc_run include/net/pkt_sched.h:120 [inline]
net_tx_action+0x2b1/0x6c0 net/core/dev.c:4551
__do_softirq+0x115/0x33f kernel/softirq.c:292
do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1082
do_softirq.part.0+0x6b/0x80 kernel/softirq.c:337
do_softirq kernel/softirq.c:329 [inline]
__local_bh_enable_ip+0x76/0x80 kernel/softirq.c:189
local_bh_enable include/linux/bottom_half.h:32 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:688 [inline]
ip6_finish_output2+0x7bb/0xec0 net/ipv6/ip6_output.c:117
__ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
__ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
dst_output include/net/dst.h:436 [inline]
ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
ip6_send_skb+0x53/0x110 net/ipv6/ip6_output.c:1795
udp_v6_send_skb.isra.0+0x3ec/0xa70 net/ipv6/udp.c:1173
udpv6_sendmsg+0x1906/0x1c20 net/ipv6/udp.c:1471
inet6_sendmsg+0x6d/0x90 net/ipv6/af_inet6.c:576
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0x9f/0xc0 net/socket.c:657
___sys_sendmsg+0x2b7/0x5d0 net/socket.c:2311
__sys_sendmmsg+0x123/0x350 net/socket.c:2413
__do_sys_sendmmsg net/socket.c:2442 [inline]
__se_sys_sendmmsg net/socket.c:2439 [inline]
__x64_sys_sendmmsg+0x64/0x80 net/socket.c:2439
do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 21817 Comm: syz-executor.2 Not tainted 5.4.0-rc6+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes:
|
|
David S. Miller | d31e95585c |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
The only slightly tricky merge conflict was the netdevsim because the mutex locking fix overlapped a lot of driver reload reorganization. The rest were (relatively) trivial in nature. Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Vincent Prince | 546b85bb0a |
net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware
There is networking hardware that isn't based on Ethernet for layers 1 and 2. For example CAN. CAN is a multi-master serial bus standard for connecting Electronic Control Units [ECUs] also known as nodes. A frame on the CAN bus carries up to 8 bytes of payload. Frame corruption is detected by a CRC. However frame loss due to corruption is possible, but a quite unusual phenomenon. While fq_codel works great for TCP/IP, it doesn't for CAN. There are a lot of legacy protocols on top of CAN, which are not build with flow control or high CAN frame drop rates in mind. When using fq_codel, as soon as the queue reaches a certain delay based length, skbs from the head of the queue are silently dropped. Silently meaning that the user space using a send() or similar syscall doesn't get an error. However TCP's flow control algorithm will detect dropped packages and adjust the bandwidth accordingly. When using fq_codel and sending raw frames over CAN, which is the common use case, the user space thinks the package has been sent without problems, because send() returned without an error. pfifo_fast will drop skbs, if the queue length exceeds the maximum. But with this scheduler the skbs at the tail are dropped, an error (-ENOBUFS) is propagated to user space. So that the user space can slow down the package generation. On distributions, where fq_codel is made default via CONFIG_DEFAULT_NET_SCH during compile time, or set default during runtime with sysctl net.core.default_qdisc (see [1]), we get a bad user experience. In my test case with pfifo_fast, I can transfer thousands of million CAN frames without a frame drop. On the other hand with fq_codel there is more then one lost CAN frame per thousand frames. As pointed out fq_codel is not suited for CAN hardware, so this patch changes attach_one_default_qdisc() to use pfifo_fast for "ARPHRD_CAN" network devices. During transition of a netdev from down to up state the default queuing discipline is attached by attach_default_qdiscs() with the help of attach_one_default_qdisc(). This patch modifies attach_one_default_qdisc() to attach the pfifo_fast (pfifo_fast_ops) if the network device type is "ARPHRD_CAN". [1] https://github.com/systemd/systemd/issues/9194 Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Vincent Prince <vincent.prince.fr@gmail.com> Acked-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Vincent Prince | fa784f2ac0 |
net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware
There is networking hardware that isn't based on Ethernet for layers 1 and 2. For example CAN. CAN is a multi-master serial bus standard for connecting Electronic Control Units [ECUs] also known as nodes. A frame on the CAN bus carries up to 8 bytes of payload. Frame corruption is detected by a CRC. However frame loss due to corruption is possible, but a quite unusual phenomenon. While fq_codel works great for TCP/IP, it doesn't for CAN. There are a lot of legacy protocols on top of CAN, which are not build with flow control or high CAN frame drop rates in mind. When using fq_codel, as soon as the queue reaches a certain delay based length, skbs from the head of the queue are silently dropped. Silently meaning that the user space using a send() or similar syscall doesn't get an error. However TCP's flow control algorithm will detect dropped packages and adjust the bandwidth accordingly. When using fq_codel and sending raw frames over CAN, which is the common use case, the user space thinks the package has been sent without problems, because send() returned without an error. pfifo_fast will drop skbs, if the queue length exceeds the maximum. But with this scheduler the skbs at the tail are dropped, an error (-ENOBUFS) is propagated to user space. So that the user space can slow down the package generation. On distributions, where fq_codel is made default via CONFIG_DEFAULT_NET_SCH during compile time, or set default during runtime with sysctl net.core.default_qdisc (see [1]), we get a bad user experience. In my test case with pfifo_fast, I can transfer thousands of million CAN frames without a frame drop. On the other hand with fq_codel there is more then one lost CAN frame per thousand frames. As pointed out fq_codel is not suited for CAN hardware, so this patch changes attach_one_default_qdisc() to use pfifo_fast for "ARPHRD_CAN" network devices. During transition of a netdev from down to up state the default queuing discipline is attached by attach_default_qdiscs() with the help of attach_one_default_qdisc(). This patch modifies attach_one_default_qdisc() to attach the pfifo_fast (pfifo_fast_ops) if the network device type is "ARPHRD_CAN". [1] https://github.com/systemd/systemd/issues/9194 Signed-off-by: Vincent Prince <vincent.prince.fr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Taehee Yoo | ab92d68fc2 |
net: core: add generic lockdep keys
Some interface types could be nested. (VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..) These interface types should set lockdep class because, without lockdep class key, lockdep always warn about unexisting circular locking. In the current code, these interfaces have their own lockdep class keys and these manage itself. So that there are so many duplicate code around the /driver/net and /net/. This patch adds new generic lockdep keys and some helper functions for it. This patch does below changes. a) Add lockdep class keys in struct net_device - qdisc_running, xmit, addr_list, qdisc_busylock - these keys are used as dynamic lockdep key. b) When net_device is being allocated, lockdep keys are registered. - alloc_netdev_mqs() c) When net_device is being free'd llockdep keys are unregistered. - free_netdev() d) Add generic lockdep key helper function - netdev_register_lockdep_key() - netdev_unregister_lockdep_key() - netdev_update_lockdep_key() e) Remove unnecessary generic lockdep macro and functions f) Remove unnecessary lockdep code of each interfaces. After this patch, each interface modules don't need to maintain their lockdep keys. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |