OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Davide Caratti	cd9b50adc6	net/sched: ets: fix crash when flipping from 'strict' to 'quantum' While running kselftests, Hangbin observed that sch_ets.sh often crashes, and splats like the following one are seen in the output of 'dmesg': BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 159f12067 P4D 159f12067 PUD 159f13067 PMD 0 Oops: 0000 [#1] SMP NOPTI CPU: 2 PID: 921 Comm: tc Not tainted 5.14.0-rc6+ #458 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014 RIP: 0010:__list_del_entry_valid+0x2d/0x50 Code: 48 8b 57 08 48 b9 00 01 00 00 00 00 ad de 48 39 c8 0f 84 ac 6e 5b 00 48 b9 22 01 00 00 00 00 ad de 48 39 ca 0f 84 cf 6e 5b 00 <48> 8b 32 48 39 fe 0f 85 af 6e 5b 00 48 8b 50 08 48 39 f2 0f 85 94 RSP: 0018:ffffb2da005c3890 EFLAGS: 00010217 RAX: 0000000000000000 RBX: ffff9073ba23f800 RCX: dead000000000122 RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff9073ba23fbc8 RBP: ffff9073ba23f890 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000001 R12: dead000000000100 R13: ffff9073ba23fb00 R14: 0000000000000002 R15: 0000000000000002 FS: 00007f93e5564e40(0000) GS:ffff9073bba00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000014ad34000 CR4: 0000000000350ee0 Call Trace: ets_qdisc_reset+0x6e/0x100 [sch_ets] qdisc_reset+0x49/0x1d0 tbf_reset+0x15/0x60 [sch_tbf] qdisc_reset+0x49/0x1d0 dev_reset_queue.constprop.42+0x2f/0x90 dev_deactivate_many+0x1d3/0x3d0 dev_deactivate+0x56/0x90 qdisc_graft+0x47e/0x5a0 tc_get_qdisc+0x1db/0x3e0 rtnetlink_rcv_msg+0x164/0x4c0 netlink_rcv_skb+0x50/0x100 netlink_unicast+0x1a5/0x280 netlink_sendmsg+0x242/0x480 sock_sendmsg+0x5b/0x60 ____sys_sendmsg+0x1f2/0x260 ___sys_sendmsg+0x7c/0xc0 __sys_sendmsg+0x57/0xa0 do_syscall_64+0x3a/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f93e44b8338 Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 43 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55 RSP: 002b:00007ffc0db737a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000061255c06 RCX: 00007f93e44b8338 RDX: 0000000000000000 RSI: 00007ffc0db73810 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000001 R13: 0000000000687880 R14: 0000000000000000 R15: 0000000000000000 Modules linked in: sch_ets sch_tbf dummy rfkill iTCO_wdt iTCO_vendor_support intel_rapl_msr intel_rapl_common joydev i2c_i801 pcspkr i2c_smbus lpc_ich virtio_balloon ip_tables xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel ahci libahci ghash_clmulni_intel libata serio_raw virtio_blk virtio_console virtio_net net_failover failover sunrpc dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 When the change() function decreases the value of 'nstrict', we must take into account that packets might be already enqueued on a class that flips from 'strict' to 'quantum': otherwise that class will not be added to the bandwidth-sharing list. Then, a call to ets_qdisc_reset() will attempt to do list_del(&alist) with 'alist' filled with zero, hence the NULL pointer dereference. For classes flipping from 'strict' to 'quantum', initialize an empty list and eventually add it to the bandwidth-sharing list, if there are packets already enqueued. In this way, the kernel will: a) prevent crashing as described above. b) avoid retaining the backlog packets (for an arbitrarily long time) in case no packet is enqueued after a change from 'strict' to 'quantum'. Reported-by: Hangbin Liu <liuhangbin@gmail.com> Fixes: `dcc68b4d80` ("net: sch_ets: Add a new Qdisc") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-25 11:15:30 +01:00
Toke Høiland-Jørgensen	86b9bbd332	sch_cake: fix srchost/dsthost hashing mode When adding support for using the skb->hash value as the flow hash in CAKE, I accidentally introduced a logic error that broke the host-only isolation modes of CAKE (srchost and dsthost keywords). Specifically, the flow_hash variable should stay initialised to 0 in cake_hash() in pure host-based hashing mode. Add a check for this before using the skb->hash value as flow_hash. Fixes: `b0c19ed608` ("sch_cake: Take advantage of skb->hash where appropriate") Reported-by: Pete Heist <pete@heistp.net> Tested-by: Pete Heist <pete@heistp.net> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-18 10:14:00 +01:00
Hangbin Liu	d09c548dbf	net: sched: act_mirred: Reset ct info when mirror/redirect skb When mirror/redirect a skb to a different port, the ct info should be reset for reclassification. Or the pkts will match unexpected rules. For example, with following topology and commands: ----------- \| veth0 -+------- \| veth1 -+------- \| ------------ tc qdisc add dev veth0 clsact # The same with "action mirred egress mirror dev veth1" or "action mirred ingress redirect dev veth1" tc filter add dev veth0 egress chain 1 protocol ip flower ct_state +trk action mirred ingress mirror dev veth1 tc filter add dev veth0 egress chain 0 protocol ip flower ct_state -inv action ct commit action goto chain 1 tc qdisc add dev veth1 clsact tc filter add dev veth1 ingress chain 0 protocol ip flower ct_state +trk action drop ping <remove ip via veth0> & tc -s filter show dev veth1 ingress With command 'tc -s filter show', we can find the pkts were dropped on veth1. Fixes: `b57dc7c13e` ("net/sched: Introduce action ct") Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-09 10:58:47 +01:00
Yunsheng Lin	06f5553e0f	net: sched: fix lockdep_set_class() typo error for sch->seqlock According to comment in qdisc_alloc(), sch->seqlock's lockdep class key should be set to qdisc_tx_busylock, due to possible type error, sch->busylock's lockdep class key is set to qdisc_tx_busylock, which is duplicated because sch->busylock's lockdep class key is already set in qdisc_alloc(). So fix it by replacing sch->busylock with sch->seqlock. Fixes: `96009c7d50` ("sched: replace __QDISC_STATE_RUNNING bit with a spin lock") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-04 09:59:24 +01:00
Yannick Vignon	ebca25ead0	net/sched: taprio: Fix init procedure Commit `13511704f8` ("net: taprio offload: enforce qdisc to netdev queue mapping") resulted in duplicate entries in the qdisc hash. While this did not impact the overall operation of the qdisc and taprio code paths, it did result in an infinite loop when dumping the qdisc properties, at least on one target (NXP LS1028 ARDB). Removing the duplicate call to qdisc_hash_add() solves the problem. Fixes: `13511704f8` ("net: taprio offload: enforce qdisc to netdev queue mapping") Signed-off-by: Yannick Vignon <yannick.vignon@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-02 11:47:35 +01:00
Yajun Deng	9d85a6f44b	net: sched: cls_api: Fix the the wrong parameter The 4th parameter in tc_chain_notify() should be flags rather than seq. Let's change it back correctly. Fixes: `32a4f5ecd7` ("net: sched: introduce chain object to uapi") Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-22 00:48:03 -07:00
Peilin Ye	727d6a8b7e	net/sched: act_skbmod: Skip non-Ethernet packets Currently tcf_skbmod_act() assumes that packets use Ethernet as their L2 protocol, which is not always the case. As an example, for CAN devices: $ ip link add dev vcan0 type vcan $ ip link set up vcan0 $ tc qdisc add dev vcan0 root handle 1: htb $ tc filter add dev vcan0 parent 1: protocol ip prio 10 \ matchall action skbmod swap mac Doing the above silently corrupts all the packets. Do not perform skbmod actions for non-Ethernet packets. Fixes: `86da71b573` ("net_sched: Introduce skbmod action") Reviewed-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 07:13:51 -07:00
Pavel Skripkin	f5051bcece	net: sched: fix memory leak in tcindex_partial_destroy_work Syzbot reported memory leak in tcindex_set_parms(). The problem was in non-freed perfect hash in tcindex_partial_destroy_work(). In tcindex_set_parms() new tcindex_data is allocated and some fields from old one are copied to new one, but not the perfect hash. Since tcindex_partial_destroy_work() is the destroy function for old tcindex_data, we need to free perfect hash to avoid memory leak. Reported-and-tested-by: syzbot+f0bbb2287b8993d4fa74@syzkaller.appspotmail.com Fixes: `331b72922c` ("net: sched: RCU cls_tcindex") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-18 09:45:46 -07:00
Louis Peens	77ac5e40c4	net/sched: act_ct: remove and free nf_table callbacks When cleaning up the nf_table in tcf_ct_flow_table_cleanup_work there is no guarantee that the callback list, added to by nf_flow_table_offload_add_cb, is empty. This means that it is possible that the flow_block_cb memory allocated will be lost. Fix this by iterating the list and freeing the flow_block_cb entries before freeing the nf_table entry (via freeing ct_ft). Fixes: `978703f425` ("netfilter: flowtable: Add API for registering to flow table events") Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-02 13:36:35 -07:00
wenxu	8955b90c3c	net/sched: act_ct: fix err check for nf_conntrack_confirm The confirm operation should be checked. If there are any failed, the packet should be dropped like in ovs and netfilter. Fixes: `b57dc7c13e` ("net/sched: Introduce action ct") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-02 12:07:08 -07:00
Jesper Dangaard Brouer	633fa66640	net/sched: sch_taprio: fix typo in comment I have checked that the IEEE standard 802.1Q-2018 section 8.6.9.4.5 is called AdminGateStates. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-02 11:55:28 -07:00
Jakub Kicinski	b6df00789e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Trivial conflict in net/netfilter/nf_tables_api.c. Duplicate fix in tools/testing/selftests/net/devlink_port_split.py - take the net-next version. skmsg, and L4 bpf - keep the bpf code but remove the flags and err params. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-06-29 15:45:27 -07:00
David S. Miller	e1289cfb63	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2021-06-28 The following pull-request contains BPF updates for your net-next tree. We've added 37 non-merge commits during the last 12 day(s) which contain a total of 56 files changed, 394 insertions(+), 380 deletions(-). The main changes are: 1) XDP driver RCU cleanups, from Toke Høiland-Jørgensen and Paul E. McKenney. 2) Fix bpf_skb_change_proto() IPv4/v6 GSO handling, from Maciej Żenczykowski. 3) Fix false positive kmemleak report for BPF ringbuf alloc, from Rustam Kovhaev. 4) Fix x86 JIT's extable offset calculation for PROBE_LDX NULL, from Ravi Bangoria. 5) Enable libbpf fallback probing with tracing under RHEL7, from Jonathan Edwards. 6) Clean up x86 JIT to remove unused cnt tracking from EMIT macro, from Jiri Olsa. 7) Netlink cleanups for libbpf to please Coverity, from Kumar Kartikeya Dwivedi. 8) Allow to retrieve ancestor cgroup id in tracing programs, from Namhyung Kim. 9) Fix lirc BPF program query to use user-provided prog_cnt, from Sean Young. 10) Add initial libbpf doc including generated kdoc for its API, from Grant Seltzer. 11) Make xdp_rxq_info_unreg_mem_model() more robust, from Jakub Kicinski. 12) Fix up bpfilter startup log-level to info level, from Gary Lin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 15:28:03 -07:00
Pavel Skripkin	3f2db25009	net: sched: fix warning in tcindex_alloc_perfect_hash Syzbot reported warning in tcindex_alloc_perfect_hash. The problem was in too big cp->hash, which triggers warning in kmalloc. Since cp->hash comes from userspace, there is no need to warn if value is not correct Fixes: `b9a24bb76b` ("net_sched: properly handle failure case of tcf_exts_init()") Reported-and-tested-by: syzbot+1071ad60cd7df39fdadb@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Acked-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 13:03:29 -07:00
Toke Høiland-Jørgensen	77151ccf10	bpf, sched: Remove unneeded rcu_read_lock() around BPF program invocation The rcu_read_lock() call in cls_bpf and act_bpf are redundant: on the TX side, there's already a call to rcu_read_lock_bh() in __dev_queue_xmit(), and on RX there's a covering rcu_read_lock() in netif_receive_skb{,_list}_internal(). With the previous patches we also amended the lockdep checks in the map code to not require any particular RCU flavour, so we can just get rid of the rcu_read_lock()s. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210624160609.292325-7-toke@redhat.com	2021-06-24 19:43:11 +02:00
Yunsheng Lin	d3e0f57501	net: sched: remove qdisc->empty for lockless qdisc As MISSED and DRAINING state are used to indicate a non-empty qdisc, qdisc->empty is not longer needed, so remove it. Acked-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # flexcan Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-23 12:17:35 -07:00
Yunsheng Lin	c4fef01ba4	net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc Currently pfifo_fast has both TCQ_F_CAN_BYPASS and TCQ_F_NOLOCK flag set, but queue discipline by-pass does not work for lockless qdisc because skb is always enqueued to qdisc even when the qdisc is empty, see __dev_xmit_skb(). This patch calls sch_direct_xmit() to transmit the skb directly to the driver for empty lockless qdisc, which aviod enqueuing and dequeuing operation. As qdisc->empty is not reliable to indicate a empty qdisc because there is a time window between enqueuing and setting qdisc->empty. So we use the MISSED state added in commit `a90c57f2ce` ("net: sched: fix packet stuck problem for lockless qdisc"), which indicate there is lock contention, suggesting that it is better not to do the qdisc bypass in order to avoid packet out of order problem. In order to make MISSED state reliable to indicate a empty qdisc, we need to ensure that testing and clearing of MISSED state is within the protection of qdisc->seqlock, only setting MISSED state can be done without the protection of qdisc->seqlock. A MISSED state testing is added without the protection of qdisc->seqlock to aviod doing unnecessary spin_trylock() for contention case. As the enqueuing is not within the protection of qdisc->seqlock, there is still a potential data race as mentioned by Jakub [1]: thread1 thread2 thread3 qdisc_run_begin() # true qdisc_run_begin(q) set(MISSED) pfifo_fast_dequeue clear(MISSED) # recheck the queue qdisc_run_end() enqueue skb1 qdisc empty # true qdisc_run_begin() # true sch_direct_xmit() # skb2 qdisc_run_begin() set(MISSED) When above happens, skb1 enqueued by thread2 is transmited after skb2 is transmited by thread3 because MISSED state setting and enqueuing is not under the qdisc->seqlock. If qdisc bypass is disabled, skb1 has better chance to be transmited quicker than skb2. This patch does not take care of the above data race, because we view this as similar as below: Even at the same time CPU1 and CPU2 write the skb to two socket which both heading to the same qdisc, there is no guarantee that which skb will hit the qdisc first, because there is a lot of factor like interrupt/softirq/cache miss/scheduling afffecting that. There are below cases that need special handling: 1. When MISSED state is cleared before another round of dequeuing in pfifo_fast_dequeue(), and __qdisc_run() might not be able to dequeue all skb in one round and call __netif_schedule(), which might result in a non-empty qdisc without MISSED set. In order to avoid this, the MISSED state is set for lockless qdisc and __netif_schedule() will be called at the end of qdisc_run_end. 2. The MISSED state also need to be set for lockless qdisc instead of calling __netif_schedule() directly when requeuing a skb for a similar reason. 3. For netdev queue stopped case, the MISSED case need clearing while the netdev queue is stopped, otherwise there may be unnecessary __netif_schedule() calling. So a new DRAINING state is added to indicate this case, which also indicate a non-empty qdisc. 4. As there is already netif_xmit_frozen_or_stopped() checking in dequeue_skb() and sch_direct_xmit(), which are both within the protection of qdisc->seqlock, but the same checking in __dev_xmit_skb() is without the protection, which might cause empty indication of a lockless qdisc to be not reliable. So remove the checking in __dev_xmit_skb(), and the checking in the protection of qdisc->seqlock seems enough to avoid the cpu consumption problem for netdev queue stopped case. 1. https://lkml.org/lkml/2021/5/29/215 Acked-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # flexcan Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-23 12:17:35 -07:00
Eric Dumazet	0cd58e5c53	pkt_sched: sch_qfq: fix qfq_change_class() error path If qfq_change_class() is unable to allocate memory for qfq_aggregate, it frees the class that has been inserted in the class hash table, but does not unhash it. Defer the insertion after the problematic allocation. BUG: KASAN: use-after-free in hlist_add_head include/linux/list.h:884 [inline] BUG: KASAN: use-after-free in qdisc_class_hash_insert+0x200/0x210 net/sched/sch_api.c:731 Write of size 8 at addr ffff88814a534f10 by task syz-executor.4/31478 CPU: 0 PID: 31478 Comm: syz-executor.4 Not tainted 5.13.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:233 __kasan_report mm/kasan/report.c:419 [inline] kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:436 hlist_add_head include/linux/list.h:884 [inline] qdisc_class_hash_insert+0x200/0x210 net/sched/sch_api.c:731 qfq_change_class+0x96c/0x1990 net/sched/sch_qfq.c:489 tc_ctl_tclass+0x514/0xe50 net/sched/sch_api.c:2113 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5564 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433 do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x4665d9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fdc7b5f0188 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000000000056bf80 RCX: 00000000004665d9 RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000000000003 RBP: 00007fdc7b5f01d0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002 R13: 00007ffcf7310b3f R14: 00007fdc7b5f0300 R15: 0000000000022000 Allocated by task 31445: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:428 [inline] ____kasan_kmalloc mm/kasan/common.c:507 [inline] ____kasan_kmalloc mm/kasan/common.c:466 [inline] __kasan_kmalloc+0x9b/0xd0 mm/kasan/common.c:516 kmalloc include/linux/slab.h:556 [inline] kzalloc include/linux/slab.h:686 [inline] qfq_change_class+0x705/0x1990 net/sched/sch_qfq.c:464 tc_ctl_tclass+0x514/0xe50 net/sched/sch_api.c:2113 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5564 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433 do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 31445: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track+0x1c/0x30 mm/kasan/common.c:46 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:357 ____kasan_slab_free mm/kasan/common.c:360 [inline] ____kasan_slab_free mm/kasan/common.c:325 [inline] __kasan_slab_free+0xfb/0x130 mm/kasan/common.c:368 kasan_slab_free include/linux/kasan.h:212 [inline] slab_free_hook mm/slub.c:1583 [inline] slab_free_freelist_hook+0xdf/0x240 mm/slub.c:1608 slab_free mm/slub.c:3168 [inline] kfree+0xe5/0x7f0 mm/slub.c:4212 qfq_change_class+0x10fb/0x1990 net/sched/sch_qfq.c:518 tc_ctl_tclass+0x514/0xe50 net/sched/sch_api.c:2113 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5564 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433 do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff88814a534f00 which belongs to the cache kmalloc-128 of size 128 The buggy address is located 16 bytes inside of 128-byte region [ffff88814a534f00, ffff88814a534f80) The buggy address belongs to the page: page:ffffea0005294d00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x14a534 flags: 0x57ff00000000200(slab\|node=1\|zone=2\|lastcpupid=0x7ff) raw: 057ff00000000200 ffffea00004fee00 0000000600000006 ffff8880110418c0 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL\|__GFP_NOWARN\|__GFP_NORETRY), pid 29797, ts 604817765317, free_ts 604810151744 prep_new_page mm/page_alloc.c:2358 [inline] get_page_from_freelist+0x1033/0x2b60 mm/page_alloc.c:3994 __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5200 alloc_pages+0x18c/0x2a0 mm/mempolicy.c:2272 alloc_slab_page mm/slub.c:1646 [inline] allocate_slab+0x2c5/0x4c0 mm/slub.c:1786 new_slab mm/slub.c:1849 [inline] new_slab_objects mm/slub.c:2595 [inline] ___slab_alloc+0x4a1/0x810 mm/slub.c:2758 __slab_alloc.constprop.0+0xa7/0xf0 mm/slub.c:2798 slab_alloc_node mm/slub.c:2880 [inline] slab_alloc mm/slub.c:2922 [inline] __kmalloc+0x315/0x330 mm/slub.c:4050 kmalloc include/linux/slab.h:561 [inline] kzalloc include/linux/slab.h:686 [inline] __register_sysctl_table+0x112/0x1090 fs/proc/proc_sysctl.c:1318 mpls_dev_sysctl_register+0x1b7/0x2d0 net/mpls/af_mpls.c:1421 mpls_add_dev net/mpls/af_mpls.c:1472 [inline] mpls_dev_notify+0x214/0x8b0 net/mpls/af_mpls.c:1588 notifier_call_chain+0xb5/0x200 kernel/notifier.c:83 call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:2121 call_netdevice_notifiers_extack net/core/dev.c:2133 [inline] call_netdevice_notifiers net/core/dev.c:2147 [inline] register_netdevice+0x106b/0x1500 net/core/dev.c:10312 veth_newlink+0x585/0xac0 drivers/net/veth.c:1547 __rtnl_newlink+0x1062/0x1710 net/core/rtnetlink.c:3452 rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3500 page last free stack trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1298 [inline] free_pcp_prepare+0x223/0x300 mm/page_alloc.c:1342 free_unref_page_prepare mm/page_alloc.c:3250 [inline] free_unref_page+0x12/0x1d0 mm/page_alloc.c:3298 __vunmap+0x783/0xb60 mm/vmalloc.c:2566 free_work+0x58/0x70 mm/vmalloc.c:80 process_one_work+0x98d/0x1600 kernel/workqueue.c:2276 worker_thread+0x64c/0x1120 kernel/workqueue.c:2422 kthread+0x3b1/0x4a0 kernel/kthread.c:313 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Memory state around the buggy address: ffff88814a534e00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88814a534e80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88814a534f00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88814a534f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88814a535000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Fixes: `462dbc9101` ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-21 14:50:19 -07:00
Boris Sukholitko	6d5516177d	Revert "net/sched: cls_flower: Remove match on n_proto" This reverts commit `0dca2c7404`. The commit in question breaks hardware offload of flower filters. Quoting Vladimir Oltean <olteanv@gmail.com>: fl_hw_replace_filter() and fl_reoffload() create a struct flow_cls_offload with a rule->match.mask member derived from the mask of the software classifier: &f->mask->key - that same mask that is used for initializing the flow dissector keys, and the one from which Boris removed the basic.n_proto member because it was bothering him. Reported-by: Vadym Kochan <vadym.kochan@plvision.eu> Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-21 14:46:36 -07:00
Jakub Kicinski	adc2e56ebe	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Trivial conflicts in net/can/isotp.c and tools/testing/selftests/net/mptcp/mptcp_connect.sh scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c to include/linux/ptp_clock_kernel.h in -next so re-apply the fix there. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-06-18 19:47:02 -07:00
Yang Yingliang	55d96f72e8	net: sched: fix error return code in tcf_del_walker() When nla_put_u32() fails, 'ret' could be 0, it should return error code in tcf_del_walker(). Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-17 11:36:18 -07:00
Boris Sukholitko	0dca2c7404	net/sched: cls_flower: Remove match on n_proto The following flower filters fail to match packets: tc filter add dev eth0 ingress protocol 0x8864 flower \ action simple sdata hi64 tc filter add dev eth0 ingress protocol 802.1q flower \ vlan_ethtype 0x8864 action simple sdata "hi vlan" The protocol 0x8864 (ETH_P_PPP_SES) is a tunnel protocol. As such, it is being dissected by __skb_flow_dissect and it's internal protocol is being set as key->basic.n_proto. IOW, the existence of ETH_P_PPP_SES tunnel is transparent to the callers of __skb_flow_dissect. OTOH, in the filters above, cls_flower configures its key->basic.n_proto to the ETH_P_PPP_SES value configured by the user. Matching on this key fails because of __skb_flow_dissect "transparency" mentioned above. In the following, I would argue that the problem lies with cls_flower, unnessary attempting key->basic.n_proto match. There are 3 close places in fl_set_key in cls_flower setting up mask->basic.n_proto. They are (in reverse order of appearance in the code) due to: (a) No vlan is given: use TCA_FLOWER_KEY_ETH_TYPE parameter (b) One vlan tag is given: use TCA_FLOWER_KEY_VLAN_ETH_TYPE (c) Two vlans are given: use TCA_FLOWER_KEY_CVLAN_ETH_TYPE The match in case (a) is unneeded because flower has no its own eth_type parameter. It was removed by Jamal Hadi Salim in commit 488b41d020fb06428b90289f70a41210718f52b7 in iproute2. For TCA_FLOWER_KEY_ETH_TYPE the userspace uses the generic tc filter protocol field. Therefore the match for the case (a) is done by tc itself. The matches in cases (b), (c) are unneeded because the protocol will appear in and will be matched by flow_dissector_key_vlan.vlan_tpid. Therefore in the best case, key->basic.n_proto will try to repeat vlan key match again. The below patch removes mask->basic.n_proto setting and resets it to 0 in case (c). Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 10:26:51 -07:00
Tyson Moore	4f667b8e04	sch_cake: revise docs for RFC 8622 LE PHB support Commit `b8392808eb` ("sch_cake: add RFC 8622 LE PHB support to CAKE diffserv handling") added the LE mark to the Bulk tin. Update the comments to reflect the change. Signed-off-by: Tyson Moore <tyson@tyson.me> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-14 12:10:57 -07:00
Maxim Mikityanskiy	ba91c49ded	sch_cake: Fix out of bounds when parsing TCP options and header The TCP option parser in cake qdisc (cake_get_tcpopt and cake_tcph_may_drop) could read one byte out of bounds. When the length is 1, the execution flow gets into the loop, reads one byte of the opcode, and if the opcode is neither TCPOPT_EOL nor TCPOPT_NOP, it reads one more byte, which exceeds the length of 1. This fix is inspired by commit `9609dad263` ("ipv4: tcp_input: fix stack out of bounds when parsing TCP options."). v2 changes: Added doff validation in cake_get_tcphdr to avoid parsing garbage as TCP header. Although it wasn't strictly an out-of-bounds access (memory was allocated), garbage values could be read where CAKE expected the TCP header if doff was smaller than 5. Cc: Young Xiao <92siuyang@gmail.com> Fixes: `8b7138814f` ("sch_cake: Add optional ACK filter") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-10 14:26:18 -07:00
Marcelo Ricardo Leitner	13c62f5371	net/sched: act_ct: handle DNAT tuple collision This this the counterpart of `8aa7b526dc` ("openvswitch: handle DNAT tuple collision") for act_ct. From that commit changelog: """ With multiple DNAT rules it's possible that after destination translation the resulting tuples collide. ... Netfilter handles this case by allocating a null binding for SNAT at egress by default. Perform the same operation in openvswitch for DNAT if no explicit SNAT is requested by the user and allocate a null binding for SNAT for packets in the "original" direction. """ Fixes: `95219afbb9` ("act_ct: support asymmetric conntrack") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-09 15:34:51 -07:00
David S. Miller	126285651b	Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net Bug fixes overlapping feature additions and refactoring, mostly. Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-07 13:01:52 -07:00
Yu Kuai	9977d6f56b	sch_htb: fix doc warning in htb_lookup_leaf() Add description for parameters of htb_lookup_leaf() to fix gcc W=1 warnings: net/sched/sch_htb.c:773: warning: Function parameter or member 'hprio' not described in 'htb_lookup_leaf' net/sched/sch_htb.c:773: warning: Function parameter or member 'prio' not described in 'htb_lookup_leaf' Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-07 12:28:41 -07:00
Yu Kuai	2c3ee53ea6	sch_htb: fix doc warning in htb_do_events() Add description for parameters of htb_do_events() to fix gcc W=1 warnings: net/sched/sch_htb.c:708: warning: Function parameter or member 'q' not described in 'htb_do_events' net/sched/sch_htb.c:708: warning: Function parameter or member 'level' not described in 'htb_do_events' net/sched/sch_htb.c:708: warning: Function parameter or member 'start' not described in 'htb_do_events' Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-07 12:28:41 -07:00
Yu Kuai	0e5c90848a	sch_htb: fix doc warning in htb_charge_class() Add description for parameters of htb_charge_class() to fix gcc W=1 warnings: net/sched/sch_htb.c:663: warning: Function parameter or member 'q' not described in 'htb_charge_class' net/sched/sch_htb.c:663: warning: Function parameter or member 'cl' not described in 'htb_charge_class' net/sched/sch_htb.c:663: warning: Function parameter or member 'level' not described in 'htb_charge_class' net/sched/sch_htb.c:663: warning: Function parameter or member 'skb' not described in 'htb_charge_class' Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-07 12:28:41 -07:00
Yu Kuai	9a034f25e4	sch_htb: fix doc warning in htb_deactivate() Add description for parameters of htb_deactivate() to fix gcc W=1 warnings: net/sched/sch_htb.c:578: warning: Function parameter or member 'q' not described in 'htb_deactivate' net/sched/sch_htb.c:578: warning: Function parameter or member 'cl' not described in 'htb_deactivate' Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-07 12:28:41 -07:00
Yu Kuai	8df7e8fff8	sch_htb: fix doc warning in htb_activate() Add description for parameters of htb_activate() to fix gcc W=1 warnings: net/sched/sch_htb.c:562: warning: Function parameter or member 'q' not described in 'htb_activate' net/sched/sch_htb.c:562: warning: Function parameter or member 'cl' not described in 'htb_activate' Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-07 12:28:41 -07:00
Yu Kuai	4b479e9883	sch_htb: fix doc warning in htb_change_class_mode() Add description for parameters of htb_change_class_mode() to fix gcc W=1 warnings: net/sched/sch_htb.c:533: warning: Function parameter or member 'q' not described in 'htb_change_class_mode' net/sched/sch_htb.c:533: warning: Function parameter or member 'cl' not described in 'htb_change_class_mode' net/sched/sch_htb.c:533: warning: Function parameter or member 'diff' not described in 'htb_change_class_mode' Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-07 12:28:40 -07:00
Yu Kuai	1e9559527a	sch_htb: fix doc warning in htb_class_mode() Add description for parameters of htb_class_mode() to fix gcc W=1 warnings: net/sched/sch_htb.c:507: warning: Function parameter or member 'cl' not described in 'htb_class_mode' net/sched/sch_htb.c:507: warning: Function parameter or member 'diff' not described in 'htb_class_mode' Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-07 12:28:40 -07:00
Yu Kuai	4113be2020	sch_htb: fix doc warning in htb_deactivate_prios() Add description for parameters of htb_deactivate_prios() to fix gcc W=1 warnings: net/sched/sch_htb.c:442: warning: Function parameter or member 'q' not described in 'htb_deactivate_prios' net/sched/sch_htb.c:442: warning: Function parameter or member 'cl' not described in 'htb_deactivate_prios' Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-07 12:28:40 -07:00
Yu Kuai	876b5fc0c0	sch_htb: fix doc warning in htb_activate_prios() Add description for parameters of htb_activate_prios() to fix gcc W=1 warnings: net/sched/sch_htb.c:407: warning: Function parameter or member 'q' not described in 'htb_activate_prios' net/sched/sch_htb.c:407: warning: Function parameter or member 'cl' not described in 'htb_activate_prios' Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-07 12:28:40 -07:00
Yu Kuai	5f8c6d05f3	sch_htb: fix doc warning in htb_remove_class_from_row() Add description for parameters of htb_remove_class_from_row() to fix gcc W=1 warnings: net/sched/sch_htb.c:380: warning: Function parameter or member 'q' not described in 'htb_remove_class_from_row' net/sched/sch_htb.c:380: warning: Function parameter or member 'cl' not described in 'htb_remove_class_from_row' net/sched/sch_htb.c:380: warning: Function parameter or member 'mask' not described in 'htb_remove_class_from_row' Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-07 12:28:40 -07:00
Yu Kuai	996bccc39a	sch_htb: fix doc warning in htb_add_class_to_row() Add description for parameters of htb_add_class_to_row() to fix gcc W=1 warnings: net/sched/sch_htb.c:351: warning: Function parameter or member 'q' not described in 'htb_add_class_to_row' net/sched/sch_htb.c:351: warning: Function parameter or member 'cl' not described in 'htb_add_class_to_row' net/sched/sch_htb.c:351: warning: Function parameter or member 'mask' not described in 'htb_add_class_to_row' Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-07 12:28:40 -07:00
Yu Kuai	274e5d0e55	sch_htb: fix doc warning in htb_next_rb_node() Add description for parameters of htb_next_rb_node() to fix gcc W=1 warnings: net/sched/sch_htb.c:339: warning: Function parameter or member 'n' not described in 'htb_next_rb_node' Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-07 12:28:40 -07:00
Yu Kuai	4d7efa73fa	sch_htb: fix doc warning in htb_add_to_wait_tree() Add description for parameters of htb_add_to_wait_tree() to fix gcc W=1 warnings: net/sched/sch_htb.c:308: warning: Function parameter or member 'q' not described in 'htb_add_to_wait_tree' net/sched/sch_htb.c:308: warning: Function parameter or member 'cl' not described in 'htb_add_to_wait_tree' net/sched/sch_htb.c:308: warning: Function parameter or member 'delay' not described in 'htb_add_to_wait_tree' Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-07 12:28:40 -07:00
Yunjian Wang	944d671d5f	sch_htb: fix refcount leak in htb_parent_to_leaf_offload The commit `ae81feb733` ("sch_htb: fix null pointer dereference on a null new_q") fixes a NULL pointer dereference bug, but it is not correct. Because htb_graft_helper properly handles the case when new_q is NULL, and after the previous patch by skipping this call which creates an inconsistency : dev_queue->qdisc will still point to the old qdisc, but cl->parent->leaf.q will point to the new one (which will be noop_qdisc, because new_q was NULL). The code is based on an assumption that these two pointers are the same, so it can lead to refcount leaks. The correct fix is to add a NULL pointer check to protect qdisc_refcount_inc inside htb_parent_to_leaf_offload. Fixes: `ae81feb733` ("sch_htb: fix null pointer dereference on a null new_q") Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Suggested-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-04 14:44:18 -07:00
Yu Kuai	a10541f5d9	sch_htb: fix doc warning in htb_add_to_id_tree() Add description for parameters of htb_add_to_id_tree() to fix gcc W=1 warnings: net/sched/sch_htb.c:282: warning: Function parameter or member 'root' not described in 'htb_add_to_id_tree' net/sched/sch_htb.c:282: warning: Function parameter or member 'cl' not described in 'htb_add_to_id_tree' net/sched/sch_htb.c:282: warning: Function parameter or member 'prio' not described in 'htb_add_to_id_tree' Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-03 15:26:32 -07:00
Jakub Kicinski	490dcecabb	mlx5: count all link events mlx5 devices were observed generating MLX5_PORT_CHANGE_SUBTYPE_ACTIVE events without an intervening MLX5_PORT_CHANGE_SUBTYPE_DOWN. This breaks link flap detection based on Linux carrier state transition count as netif_carrier_on() does nothing if carrier is already on. Make sure we count such events. netif_carrier_event() increments the counters and fires the linkwatch events. The latter is not necessary for the use case but seems like the right thing to do. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-03 13:10:17 -07:00
Boris Sukholitko	8323b20f1d	net/sched: act_vlan: No dump for unset priority Dump vlan priority only if it has been previously set. Fix the tests accordingly. Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-01 16:54:42 -07:00
Boris Sukholitko	9c5eee0afc	net/sched: act_vlan: Fix modify to allow 0 Currently vlan modification action checks existence of vlan priority by comparing it to 0. Therefore it is impossible to modify existing vlan tag to have priority 0. For example, the following tc command will change the vlan id but will not affect vlan priority: tc filter add dev eth1 ingress matchall action vlan modify id 300 \ priority 0 pipe mirred egress redirect dev eth2 The incoming packet on eth1: ethertype 802.1Q (0x8100), vlan 200, p 4, ethertype IPv4 will be changed to: ethertype 802.1Q (0x8100), vlan 300, p 4, ethertype IPv4 although the user has intended to have p == 0. The fix is to add tcfv_push_prio_exists flag to struct tcf_vlan_params and rely on it when deciding to set the priority. Fixes: `45a497f2d1` (net/sched: act_vlan: Introduce TCA_VLAN_ACT_MODIFY vlan action) Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-01 16:54:42 -07:00
Zheng Yongjun	37f2ad2b90	net: sched: Fix spelling mistakes Fix some spelling mistakes in comments: sevaral ==> several sugestion ==> suggestion unregster ==> unregister suplied ==> supplied cirsumstances ==> circumstances Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Link: https://lore.kernel.org/r/20210531020048.2920054-1-zhengyongjun3@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-05-31 22:44:56 -07:00
Ariel Levkovich	fb91702b74	net/sched: act_ct: Fix ct template allocation for zone 0 Fix current behavior of skipping template allocation in case the ct action is in zone 0. Skipping the allocation may cause the datapath ct code to ignore the entire ct action with all its attributes (commit, nat) in case the ct action in zone 0 was preceded by a ct clear action. The ct clear action sets the ct_state to untracked and resets the skb->_nfct pointer. Under these conditions and without an allocated ct template, the skb->_nfct pointer will remain NULL which will cause the tc ct action handler to exit without handling commit and nat actions, if such exist. For example, the following rule in OVS dp: recirc_id(0x2),ct_state(+new-est-rel-rpl+trk),ct_label(0/0x1), \ in_port(eth0),actions:ct_clear,ct(commit,nat(src=10.11.0.12)), \ recirc(0x37a) Will result in act_ct skipping the commit and nat actions in zone 0. The change removes the skipping of template allocation for zone 0 and treats it the same as any other zone. Fixes: `b57dc7c13e` ("net/sched: Introduce action ct") Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/20210526170110.54864-1-lariel@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-05-27 14:54:23 -07:00
Paul Blakey	0cc254e5aa	net/sched: act_ct: Offload connections with commit action Currently established connections are not offloaded if the filter has a "ct commit" action. This behavior will not offload connections of the following scenario: $ tc_filter add dev $DEV ingress protocol ip prio 1 flower \ ct_state -trk \ action ct commit action goto chain 1 $ tc_filter add dev $DEV ingress protocol ip chain 1 prio 1 flower \ action mirred egress redirect dev $DEV2 $ tc_filter add dev $DEV2 ingress protocol ip prio 1 flower \ action ct commit action goto chain 1 $ tc_filter add dev $DEV2 ingress protocol ip prio 1 chain 1 flower \ ct_state +trk+est \ action mirred egress redirect dev $DEV Offload established connections, regardless of the commit flag. Fixes: `46475bb20f` ("net/sched: act_ct: Software offload of established flows") Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Paul Blakey <paulb@nvidia.com> Link: https://lore.kernel.org/r/1622029449-27060-1-git-send-email-paulb@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-05-27 14:54:14 -07:00
Jakub Kicinski	5ada57a9a6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net cdc-wdm: s/kill_urbs/poison_urbs/ to fix build Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-05-27 09:55:10 -07:00
Vlad Buslov	9453d45ecb	net: zero-initialize tc skb extension on allocation Function skb_ext_add() doesn't initialize created skb extension with any value and leaves it up to the user. However, since extension of type TC_SKB_EXT originally contained only single value tc_skb_ext->chain its users used to just assign the chain value without setting whole extension memory to zero first. This assumption changed when TC_SKB_EXT extension was extended with additional fields but not all users were updated to initialize the new fields which leads to use of uninitialized memory afterwards. UBSAN log: [ 778.299821] UBSAN: invalid-load in net/openvswitch/flow.c:899:28 [ 778.301495] load of value 107 is not a valid value for type '_Bool' [ 778.303215] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc7+ #2 [ 778.304933] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 778.307901] Call Trace: [ 778.308680] <IRQ> [ 778.309358] dump_stack+0xbb/0x107 [ 778.310307] ubsan_epilogue+0x5/0x40 [ 778.311167] __ubsan_handle_load_invalid_value.cold+0x43/0x48 [ 778.312454] ? memset+0x20/0x40 [ 778.313230] ovs_flow_key_extract.cold+0xf/0x14 [openvswitch] [ 778.314532] ovs_vport_receive+0x19e/0x2e0 [openvswitch] [ 778.315749] ? ovs_vport_find_upcall_portid+0x330/0x330 [openvswitch] [ 778.317188] ? create_prof_cpu_mask+0x20/0x20 [ 778.318220] ? arch_stack_walk+0x82/0xf0 [ 778.319153] ? secondary_startup_64_no_verify+0xb0/0xbb [ 778.320399] ? stack_trace_save+0x91/0xc0 [ 778.321362] ? stack_trace_consume_entry+0x160/0x160 [ 778.322517] ? lock_release+0x52e/0x760 [ 778.323444] netdev_frame_hook+0x323/0x610 [openvswitch] [ 778.324668] ? ovs_netdev_get_vport+0xe0/0xe0 [openvswitch] [ 778.325950] __netif_receive_skb_core+0x771/0x2db0 [ 778.327067] ? lock_downgrade+0x6e0/0x6f0 [ 778.328021] ? lock_acquire+0x565/0x720 [ 778.328940] ? generic_xdp_tx+0x4f0/0x4f0 [ 778.329902] ? inet_gro_receive+0x2a7/0x10a0 [ 778.330914] ? lock_downgrade+0x6f0/0x6f0 [ 778.331867] ? udp4_gro_receive+0x4c4/0x13e0 [ 778.332876] ? lock_release+0x52e/0x760 [ 778.333808] ? dev_gro_receive+0xcc8/0x2380 [ 778.334810] ? lock_downgrade+0x6f0/0x6f0 [ 778.335769] __netif_receive_skb_list_core+0x295/0x820 [ 778.336955] ? process_backlog+0x780/0x780 [ 778.337941] ? mlx5e_rep_tc_netdevice_event_unregister+0x20/0x20 [mlx5_core] [ 778.339613] ? seqcount_lockdep_reader_access.constprop.0+0xa7/0xc0 [ 778.341033] ? kvm_clock_get_cycles+0x14/0x20 [ 778.342072] netif_receive_skb_list_internal+0x5f5/0xcb0 [ 778.343288] ? __kasan_kmalloc+0x7a/0x90 [ 778.344234] ? mlx5e_handle_rx_cqe_mpwrq+0x9e0/0x9e0 [mlx5_core] [ 778.345676] ? mlx5e_xmit_xdp_frame_mpwqe+0x14d0/0x14d0 [mlx5_core] [ 778.347140] ? __netif_receive_skb_list_core+0x820/0x820 [ 778.348351] ? mlx5e_post_rx_mpwqes+0xa6/0x25d0 [mlx5_core] [ 778.349688] ? napi_gro_flush+0x26c/0x3c0 [ 778.350641] napi_complete_done+0x188/0x6b0 [ 778.351627] mlx5e_napi_poll+0x373/0x1b80 [mlx5_core] [ 778.352853] __napi_poll+0x9f/0x510 [ 778.353704] ? mlx5_flow_namespace_set_mode+0x260/0x260 [mlx5_core] [ 778.355158] net_rx_action+0x34c/0xa40 [ 778.356060] ? napi_threaded_poll+0x3d0/0x3d0 [ 778.357083] ? sched_clock_cpu+0x18/0x190 [ 778.358041] ? __common_interrupt+0x8e/0x1a0 [ 778.359045] __do_softirq+0x1ce/0x984 [ 778.359938] __irq_exit_rcu+0x137/0x1d0 [ 778.360865] irq_exit_rcu+0xa/0x20 [ 778.361708] common_interrupt+0x80/0xa0 [ 778.362640] </IRQ> [ 778.363212] asm_common_interrupt+0x1e/0x40 [ 778.364204] RIP: 0010:native_safe_halt+0xe/0x10 [ 778.365273] Code: 4f ff ff ff 4c 89 e7 e8 50 3f 40 fe e9 dc fe ff ff 48 89 df e8 43 3f 40 fe eb 90 cc e9 07 00 00 00 0f 00 2d 74 05 62 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 64 05 62 00 f4 c3 cc cc 0f 1f 44 00 [ 778.369355] RSP: 0018:ffffffff84407e48 EFLAGS: 00000246 [ 778.370570] RAX: ffff88842de46a80 RBX: ffffffff84425840 RCX: ffffffff83418468 [ 778.372143] RDX: 000000000026f1da RSI: 0000000000000004 RDI: ffffffff8343af5e [ 778.373722] RBP: fffffbfff0884b08 R08: 0000000000000000 R09: ffff88842de46bcb [ 778.375292] R10: ffffed1085bc8d79 R11: 0000000000000001 R12: 0000000000000000 [ 778.376860] R13: ffffffff851124a0 R14: 0000000000000000 R15: dffffc0000000000 [ 778.378491] ? rcu_eqs_enter.constprop.0+0xb8/0xe0 [ 778.379606] ? default_idle_call+0x5e/0xe0 [ 778.380578] default_idle+0xa/0x10 [ 778.381406] default_idle_call+0x96/0xe0 [ 778.382350] do_idle+0x3d4/0x550 [ 778.383153] ? arch_cpu_idle_exit+0x40/0x40 [ 778.384143] cpu_startup_entry+0x19/0x20 [ 778.385078] start_kernel+0x3c7/0x3e5 [ 778.385978] secondary_startup_64_no_verify+0xb0/0xbb Fix the issue by providing new function tc_skb_ext_alloc() that allocates tc skb extension and initializes its memory to 0 before returning it to the caller. Change all existing users to use new API instead of calling skb_ext_add() directly. Fixes: `038ebb1a71` ("net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct") Fixes: `d29334c15d` ("net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Acked-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-05-25 15:36:42 -07:00
Taehee Yoo	9b76eade16	sch_dsmark: fix a NULL deref in qdisc_reset() If Qdisc_ops->init() is failed, Qdisc_ops->reset() would be called. When dsmark_init(Qdisc_ops->init()) is failed, it possibly doesn't initialize dsmark_qdisc_data->q. But dsmark_reset(Qdisc_ops->reset()) uses dsmark_qdisc_data->q pointer wihtout any null checking. So, panic would occur. Test commands: sysctl net.core.default_qdisc=dsmark -w ip link add dummy0 type dummy ip link add vw0 link dummy0 type virt_wifi ip link set vw0 up Splat looks like: KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f] CPU: 3 PID: 684 Comm: ip Not tainted 5.12.0+ #910 RIP: 0010:qdisc_reset+0x2b/0x680 Code: 1f 44 00 00 48 b8 00 00 00 00 00 fc ff df 41 57 41 56 41 55 41 54 55 48 89 fd 48 83 c7 18 53 48 89 fa 48 c1 ea 03 48 83 ec 20 <80> 3c 02 00 0f 85 09 06 00 00 4c 8b 65 18 0f 1f 44 00 00 65 8b 1d RSP: 0018:ffff88800fda6bf8 EFLAGS: 00010282 RAX: dffffc0000000000 RBX: ffff8880050ed800 RCX: 0000000000000000 RDX: 0000000000000003 RSI: ffffffff99e34100 RDI: 0000000000000018 RBP: 0000000000000000 R08: fffffbfff346b553 R09: fffffbfff346b553 R10: 0000000000000001 R11: fffffbfff346b552 R12: ffffffffc0824940 R13: ffff888109e83800 R14: 00000000ffffffff R15: ffffffffc08249e0 FS: 00007f5042287680(0000) GS:ffff888119800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055ae1f4dbd90 CR3: 0000000006760002 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? rcu_read_lock_bh_held+0xa0/0xa0 dsmark_reset+0x3d/0xf0 [sch_dsmark] qdisc_reset+0xa9/0x680 qdisc_destroy+0x84/0x370 qdisc_create_dflt+0x1fe/0x380 attach_one_default_qdisc.constprop.41+0xa4/0x180 dev_activate+0x4d5/0x8c0 ? __dev_open+0x268/0x390 __dev_open+0x270/0x390 Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-05-24 13:11:44 -07:00

1 2 3 4 5 ...

3240 Commits