OpenCloudOS-Kernel/net/sched
Cong Wang d349f99768 net_sched: fix RTNL deadlock again caused by request_module()
tcf_action_init_1() loads tc action modules automatically with
request_module() after parsing the tc action names, and it drops RTNL
lock and re-holds it before and after request_module(). This causes a
lot of troubles, as discovered by syzbot, because we can be in the
middle of batch initializations when we create an array of tc actions.

One of the problem is deadlock:

CPU 0					CPU 1
rtnl_lock();
for (...) {
  tcf_action_init_1();
    -> rtnl_unlock();
    -> request_module();
				rtnl_lock();
				for (...) {
				  tcf_action_init_1();
				    -> tcf_idr_check_alloc();
				   // Insert one action into idr,
				   // but it is not committed until
				   // tcf_idr_insert_many(), then drop
				   // the RTNL lock in the _next_
				   // iteration
				   -> rtnl_unlock();
    -> rtnl_lock();
    -> a_o->init();
      -> tcf_idr_check_alloc();
      // Now waiting for the same index
      // to be committed
				    -> request_module();
				    -> rtnl_lock()
				    // Now waiting for RTNL lock
				}
				rtnl_unlock();
}
rtnl_unlock();

This is not easy to solve, we can move the request_module() before
this loop and pre-load all the modules we need for this netlink
message and then do the rest initializations. So the loop breaks down
to two now:

        for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) {
                struct tc_action_ops *a_o;

                a_o = tc_action_load_ops(name, tb[i]...);
                ops[i - 1] = a_o;
        }

        for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) {
                act = tcf_action_init_1(ops[i - 1]...);
        }

Although this looks serious, it only has been reported by syzbot, so it
seems hard to trigger this by humans. And given the size of this patch,
I'd suggest to make it to net-next and not to backport to stable.

This patch has been tested by syzbot and tested with tdc.py by me.

Fixes: 0fedc63fad ("net_sched: commit action insertions together")
Reported-and-tested-by: syzbot+82752bc5331601cf4899@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+b3b63b6bff456bd95294@syzkaller.appspotmail.com
Reported-by: syzbot+ba67b12b1ca729912834@syzkaller.appspotmail.com
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20210117005657.14810-1-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18 20:13:55 -08:00
..
Kconfig net: sched: incorrect Kconfig dependencies on Netfilter modules 2020-12-09 15:49:29 -08:00
Makefile net/sched: sch_frag: add generic packet fragment support. 2020-11-27 14:36:02 -08:00
act_api.c net_sched: fix RTNL deadlock again caused by request_module() 2021-01-18 20:13:55 -08:00
act_bpf.c net: sched: fix misspellings using misspell-fixer tool 2020-11-10 17:00:28 -08:00
act_connmark.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_csum.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_ct.c net/sched: act_ct: enable stats for HW offloaded entries 2020-11-28 11:43:40 -08:00
act_ctinfo.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-10-05 18:40:01 -07:00
act_gact.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_gate.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-10-05 18:40:01 -07:00
act_ife.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_ipt.c treewide: rename nla_strlcpy to nla_strscpy. 2020-11-16 08:08:54 -08:00
act_meta_mark.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
act_meta_skbprio.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
act_meta_skbtcindex.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
act_mirred.c net/sched: sch_frag: add generic packet fragment support. 2020-11-27 14:36:02 -08:00
act_mpls.c net/sched: act_mpls: ensure LSE is pullable before reading it 2020-12-03 11:13:37 -08:00
act_nat.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_pedit.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_police.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_sample.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_simple.c treewide: rename nla_strlcpy to nla_strscpy. 2020-11-16 08:08:54 -08:00
act_skbedit.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_skbmod.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_tunnel_key.c net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels 2020-10-20 21:10:41 -07:00
act_vlan.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-10-05 18:40:01 -07:00
cls_api.c net_sched: fix RTNL deadlock again caused by request_module() 2021-01-18 20:13:55 -08:00
cls_basic.c net_sched: fix ops->bind_class() implementations 2020-01-27 10:51:43 +01:00
cls_bpf.c net_sched: fix ops->bind_class() implementations 2020-01-27 10:51:43 +01:00
cls_cgroup.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
cls_flow.c Remove uninitialized_var() macro for v5.9-rc1 2020-08-04 13:49:43 -07:00
cls_flower.c net: sched: Fix dump of MPLS_OPT_LSE_LABEL attribute in cls_flower 2020-12-09 20:39:38 -08:00
cls_fw.c net_sched: fix ops->bind_class() implementations 2020-01-27 10:51:43 +01:00
cls_matchall.c net: qos offload add flow status with dropped count 2020-06-19 12:53:30 -07:00
cls_route.c net_sched: cls_route: remove the right filter from hashtable 2020-03-16 01:59:32 -07:00
cls_rsvp.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
cls_rsvp.h net: sched: fix misspellings using misspell-fixer tool 2020-11-10 17:00:28 -08:00
cls_rsvp6.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
cls_tcindex.c tcindex_change: Remove redundant null check 2020-06-22 20:55:09 -07:00
cls_u32.c net/sched: cls_u32: simplify the return expression of u32_reoffload_knode() 2020-12-08 16:22:53 -08:00
em_canid.c net: sched: kerneldoc fixes 2020-07-13 17:20:40 -07:00
em_cmp.c net: sched: fix misspellings using misspell-fixer tool 2020-11-10 17:00:28 -08:00
em_ipset.c sched: consistently handle layer3 header accesses in the presence of VLANs 2020-07-03 14:34:53 -07:00
em_ipt.c sched: consistently handle layer3 header accesses in the presence of VLANs 2020-07-03 14:34:53 -07:00
em_meta.c sched: consistently handle layer3 header accesses in the presence of VLANs 2020-07-03 14:34:53 -07:00
em_nbyte.c net: sched: Replace zero-length array with flexible-array member 2020-02-29 21:27:02 -08:00
em_text.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
em_u32.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
ematch.c net: sched: kerneldoc fixes 2020-07-13 17:20:40 -07:00
sch_api.c net: sched: remove redundant 'rtnl_held' argument 2020-12-01 11:24:56 -08:00
sch_atm.c net: sched: convert tasklets to use new tasklet_setup() API 2020-11-07 10:41:15 -08:00
sch_blackhole.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_cake.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
sch_cbq.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-08-05 20:13:21 -07:00
sch_cbs.c net: don't include ethtool.h from netdevice.h 2020-11-23 17:27:04 -08:00
sch_choke.c net: sched: prevent invalid Scell_log shift count 2020-12-28 14:52:54 -08:00
sch_codel.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_drr.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_dsmark.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_etf.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_ets.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_fifo.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_fq.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_fq_codel.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-08-05 20:13:21 -07:00
sch_fq_pie.c net/sched: fq_pie: initialize timer earlier in fq_pie_init() 2020-12-04 14:15:01 -08:00
sch_frag.c net/sched: sch_frag: add generic packet fragment support. 2020-11-27 14:36:02 -08:00
sch_generic.c net/sched: get rid of qdisc->padded 2020-10-09 08:08:08 -07:00
sch_gred.c net: sched: prevent invalid Scell_log shift count 2020-12-28 14:52:54 -08:00
sch_hfsc.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-08-05 20:13:21 -07:00
sch_hhf.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_htb.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-08-05 20:13:21 -07:00
sch_ingress.c net: sched: Pass ingress block to tcf_classify_ingress 2020-02-19 17:49:48 -08:00
sch_mq.c net: sched: fix dump qlen for sch_mq/sch_mqprio with NOLOCK subqueues 2019-12-03 11:53:55 -08:00
sch_mqprio.c mqprio: Fix out-of-bounds access in mqprio_dump 2019-12-06 11:58:45 -08:00
sch_multiq.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_netem.c netem: fix zero division in tabledist 2020-10-29 11:45:47 -07:00
sch_pie.c net: sched: fix misspellings using misspell-fixer tool 2020-11-10 17:00:28 -08:00
sch_plug.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_prio.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_qfq.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_red.c net: sched: prevent invalid Scell_log shift count 2020-12-28 14:52:54 -08:00
sch_sfb.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_sfq.c net: sched: prevent invalid Scell_log shift count 2020-12-28 14:52:54 -08:00
sch_skbprio.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_taprio.c net/sched: sch_taprio: ensure to reset/destroy all child qdiscs 2020-12-18 16:43:29 -08:00
sch_tbf.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_teql.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00