OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
David S. Miller	ab6dd952b2	Merge branch 'smc-RDMA-net-namespace' Tony Lu says: ==================== RDMA device net namespace support for SMC This patch set introduces net namespace support for linkgroups. Path 1 is the main approach to implement net ns support. Path 2 - 4 are the additional modifications to let us know the netns. Also, I will submit changes of smc-tools to github later. Currently, smc doesn't support net namespace isolation. The ibdevs registered to smc are shared for all linkgroups and connections. When running applications in different net namespaces, such as container environment, applications should only use the ibdevs that belongs to the same net namespace. This adds a new field, net, in smc linkgroup struct. During first contact, it checks and find the linkgroup has same net namespace, if not, it is going to create and initialized the net field with first link's ibdev net namespace. When finding the rdma devices, it also checks the sk net device's and ibdev's net namespaces. After net namespace destroyed, the net device and ibdev move to root net namespace, linkgroups won't be matched, and wait for lgr free. If rdma net namespace exclusive mode is not enabled, it behaves as before. Steps to enable and test net namespaces: 1. enable RDMA device net namespace exclusive support rdma system set netns exclusive # default is shared 2. create new net namespace, move and initialize them ip netns add test1 rdma dev set mlx5_1 netns test1 ip link set dev eth2 netns test1 ip netns exec test1 ip link set eth2 up ip netns exec test1 ip addr add ${HOST_IP}/26 dev eth2 3. setup server and client, connect N <-> M ip netns exec test1 smc_run sockperf server --tcp # server ip netns exec test1 smc_run sockperf pp --tcp -i ${SERVER_IP} # client 4. netns isolated linkgroups (2 * 2 mesh) with their own linkgroups - server LG-ID LG-Role LG-Type VLAN #Conns PNET-ID 00000100 SERV SINGLE 0 0 00000200 SERV SINGLE 0 0 00000300 SERV SINGLE 0 0 00000400 SERV SINGLE 0 0 - client LG-ID LG-Role LG-Type VLAN #Conns PNET-ID 00000100 CLNT SINGLE 0 0 00000200 CLNT SINGLE 0 0 00000300 CLNT SINGLE 0 0 00000400 CLNT SINGLE 0 0 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-02 12:07:39 +00:00
Tony Lu	a838f50848	net/smc: Add net namespace for tracepoints This prints net namespace ID, helps us to distinguish different net namespaces when using tracepoints. Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-02 12:07:39 +00:00
Tony Lu	de2fea7b39	net/smc: Print net namespace in log This adds net namespace ID to the kernel log, net_cookie is unique in the whole system. It is useful in container environment. Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-02 12:07:39 +00:00
Tony Lu	79d39fc503	net/smc: Add netlink net namespace support This adds net namespace ID to diag of linkgroup, helps us to distinguish different namespaces, and net_cookie is unique in the whole system. Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-02 12:07:39 +00:00
Tony Lu	0237a3a683	net/smc: Introduce net namespace support for linkgroup Currently, rdma device supports exclusive net namespace isolation, however linkgroup doesn't know and support ibdev net namespace. Applications in the containers don't want to share the nics if we enabled rdma exclusive mode. Every net namespaces should have their own linkgroups. This patch introduce a new field net for linkgroup, which is standing for the ibdev net namespace in the linkgroup. The net in linkgroup is initialized with the net namespace of link's ibdev. It compares the net of linkgroup and sock or ibdev before choose it, if no matched, create new one in current net namespace. If rdma net namespace exclusive mode is not enabled, it behaves as before. Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-02 12:07:39 +00:00
Linus Lüssing	938f2e0b57	batman-adv: mcast: don't send link-local multicast to mcast routers The addition of routable multicast TX handling introduced a bug/regression for packets with a link-local multicast destination: These packets would be sent to all batman-adv nodes with a multicast router and to all batman-adv nodes with an old version without multicast router detection. This even disregards the batman-adv multicast fanout setting, which can potentially lead to an unwanted, high number of unicast transmissions or even congestion. Fixing this by avoiding to send link-local multicast packets to nodes in the multicast router list. Fixes: `11d458c1cb` ("batman-adv: mcast: apply optimizations for routable packets, too") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2022-01-02 09:31:17 +01:00
Linus Torvalds	278218f677	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: "Two small fixups for spaceball joystick driver and appletouch touchpad driver" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: spaceball - fix parsing of movement data packets Input: appletouch - initialize work before device registration	2022-01-01 10:21:49 -08:00
Haimin Zhang	d6d8683070	net ticp:fix a kernel-infoleak in __tipc_sendmsg() struct tipc_socket_addr.ref has a 4-byte hole,and __tipc_getname() currently copying it to user space,causing kernel-infoleak. BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline] BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline] lib/usercopy.c:33 BUG: KMSAN: kernel-infoleak in _copy_to_user+0x1c9/0x270 lib/usercopy.c:33 lib/usercopy.c:33 instrument_copy_to_user include/linux/instrumented.h:121 [inline] instrument_copy_to_user include/linux/instrumented.h:121 [inline] lib/usercopy.c:33 _copy_to_user+0x1c9/0x270 lib/usercopy.c:33 lib/usercopy.c:33 copy_to_user include/linux/uaccess.h:209 [inline] copy_to_user include/linux/uaccess.h:209 [inline] net/socket.c:287 move_addr_to_user+0x3f6/0x600 net/socket.c:287 net/socket.c:287 __sys_getpeername+0x470/0x6b0 net/socket.c:1987 net/socket.c:1987 __do_sys_getpeername net/socket.c:1997 [inline] __se_sys_getpeername net/socket.c:1994 [inline] __do_sys_getpeername net/socket.c:1997 [inline] net/socket.c:1994 __se_sys_getpeername net/socket.c:1994 [inline] net/socket.c:1994 __x64_sys_getpeername+0xda/0x120 net/socket.c:1994 net/socket.c:1994 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_x64 arch/x86/entry/common.c:51 [inline] arch/x86/entry/common.c:82 do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae Uninit was stored to memory at: tipc_getname+0x575/0x5e0 net/tipc/socket.c:757 net/tipc/socket.c:757 __sys_getpeername+0x3b3/0x6b0 net/socket.c:1984 net/socket.c:1984 __do_sys_getpeername net/socket.c:1997 [inline] __se_sys_getpeername net/socket.c:1994 [inline] __do_sys_getpeername net/socket.c:1997 [inline] net/socket.c:1994 __se_sys_getpeername net/socket.c:1994 [inline] net/socket.c:1994 __x64_sys_getpeername+0xda/0x120 net/socket.c:1994 net/socket.c:1994 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_x64 arch/x86/entry/common.c:51 [inline] arch/x86/entry/common.c:82 do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae Uninit was stored to memory at: msg_set_word net/tipc/msg.h:212 [inline] msg_set_destport net/tipc/msg.h:619 [inline] msg_set_word net/tipc/msg.h:212 [inline] net/tipc/socket.c:1486 msg_set_destport net/tipc/msg.h:619 [inline] net/tipc/socket.c:1486 __tipc_sendmsg+0x44fa/0x5890 net/tipc/socket.c:1486 net/tipc/socket.c:1486 tipc_sendmsg+0xeb/0x140 net/tipc/socket.c:1402 net/tipc/socket.c:1402 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg net/socket.c:724 [inline] sock_sendmsg_nosec net/socket.c:704 [inline] net/socket.c:2409 sock_sendmsg net/socket.c:724 [inline] net/socket.c:2409 ____sys_sendmsg+0xe11/0x12c0 net/socket.c:2409 net/socket.c:2409 ___sys_sendmsg net/socket.c:2463 [inline] ___sys_sendmsg net/socket.c:2463 [inline] net/socket.c:2492 __sys_sendmsg+0x704/0x840 net/socket.c:2492 net/socket.c:2492 __do_sys_sendmsg net/socket.c:2501 [inline] __se_sys_sendmsg net/socket.c:2499 [inline] __do_sys_sendmsg net/socket.c:2501 [inline] net/socket.c:2499 __se_sys_sendmsg net/socket.c:2499 [inline] net/socket.c:2499 __x64_sys_sendmsg+0xe2/0x120 net/socket.c:2499 net/socket.c:2499 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_x64 arch/x86/entry/common.c:51 [inline] arch/x86/entry/common.c:82 do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae Local variable skaddr created at: __tipc_sendmsg+0x2d0/0x5890 net/tipc/socket.c:1419 net/tipc/socket.c:1419 tipc_sendmsg+0xeb/0x140 net/tipc/socket.c:1402 net/tipc/socket.c:1402 Bytes 4-7 of 16 are uninitialized Memory access of size 16 starts at ffff888113753e00 Data copied to user address 0000000020000280 Reported-by: syzbot+cdbd40e0c3ca02cae3b7@syzkaller.appspotmail.com Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Link: https://lore.kernel.org/r/1640918123-14547-1-git-send-email-tcs.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-31 18:37:08 -08:00
Jianguo Wu	5e75d0b215	selftests: net: udpgro_fwd.sh: explicitly checking the available ping feature As Paolo pointed out, the result of ping IPv6 address depends on the running distro. So explicitly checking the available ping feature, as e.g. do the bareudp.sh self-tests. Fixes: `8b3170e075` ("selftests: net: using ping6 for IPv6 in udpgro_fwd.sh") Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Link: https://lore.kernel.org/r/825ee22b-4245-dbf7-d2f7-a230770d6e21@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-31 18:36:41 -08:00
Jakub Kicinski	0f1fe7b83b	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2021-12-31 We've added 2 non-merge commits during the last 14 day(s) which contain a total of 2 files changed, 3 insertions(+), 3 deletions(-). The main changes are: 1) Revert of an earlier attempt to fix xsk's poll() behavior where it turned out that the fix for a rare problem made it much worse in general, from Magnus Karlsson. (Fyi, Magnus mentioned that a proper fix is coming early next year, so the revert is mainly to avoid slipping the behavior into 5.16.) 2) Minor misc spell fix in BPF selftests, from Colin Ian King. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, selftests: Fix spelling mistake "tained" -> "tainted" Revert "xsk: Do not sleep in poll() when need_wakeup set" ==================== Link: https://lore.kernel.org/r/20211231160050.16105-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-31 18:35:54 -08:00
Mel Gorman	8008293888	mm: vmscan: reduce throttling due to a failure to make progress -fix Hugh Dickins reported the following My tmpfs swapping load (tweaked to use huge pages more heavily than in real life) is far from being a realistic load: but it was notably slowed down by your throttling mods in 5.16-rc, and this patch makes it well again - thanks. But: it very quickly hit NULL pointer until I changed that last line to if (first_pgdat) consider_reclaim_throttle(first_pgdat, sc); The likely issue is that huge pages are a major component of the test workload. When this is the case, first_pgdat may never get set if compaction is ready to continue due to this check if (IS_ENABLED(CONFIG_COMPACTION) && sc->order > PAGE_ALLOC_COSTLY_ORDER && compaction_ready(zone, sc)) { sc->compaction_ready = true; continue; } If this was true for every zone in the zonelist, first_pgdat would never get set resulting in a NULL pointer exception. Link: https://lkml.kernel.org/r/20211209095453.GM3366@techsingularity.net Fixes: `1b4e3f26f9` ("mm: vmscan: Reduce throttling due to a failure to make progress") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Rik van Riel <riel@surriel.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-12-31 13:12:55 -08:00
Mel Gorman	1b4e3f26f9	mm: vmscan: Reduce throttling due to a failure to make progress Mike Galbraith, Alexey Avramov and Darrick Wong all reported similar problems due to reclaim throttling for excessive lengths of time. In Alexey's case, a memory hog that should go OOM quickly stalls for several minutes before stalling. In Mike and Darrick's cases, a small memcg environment stalled excessively even though the system had enough memory overall. Commit `69392a403f` ("mm/vmscan: throttle reclaim when no progress is being made") introduced the problem although commit `a19594ca4a` ("mm/vmscan: increase the timeout if page reclaim is not making progress") made it worse. Systems at or near an OOM state that cannot be recovered must reach OOM quickly and memcg should kill tasks if a memcg is near OOM. To address this, only stall for the first zone in the zonelist, reduce the timeout to 1 tick for VMSCAN_THROTTLE_NOPROGRESS and only stall if the scan control nr_reclaimed is 0, kswapd is still active and there were excessive pages pending for writeback. If kswapd has stopped reclaiming due to excessive failures, do not stall at all so that OOM triggers relatively quickly. Similarly, if an LRU is simply congested, only lightly throttle similar to NOPROGRESS. Alexey's original case was the most straight forward for i in {1..3}; do tail /dev/zero; done On vanilla 5.16-rc1, this test stalled heavily, after the patch the test completes in a few seconds similar to 5.15. Alexey's second test case added watching a youtube video while tail runs 10 times. On 5.15, playback only jitters slightly, 5.16-rc1 stalls a lot with lots of frames missing and numerous audio glitches. With this patch applies, the video plays similarly to 5.15. [lkp@intel.com: Fix W=1 build warning] Link: https://lore.kernel.org/r/99e779783d6c7fce96448a3402061b9dc1b3b602.camel@gmx.de Link: https://lore.kernel.org/r/20211124011954.7cab9bb4@mail.inbox.lv Link: https://lore.kernel.org/r/20211022144651.19914-1-mgorman@techsingularity.net Link: https://lore.kernel.org/r/20211202150614.22440-1-mgorman@techsingularity.net Link: https://linux-regtracking.leemhuis.info/regzbot/regression/20211124011954.7cab9bb4@mail.inbox.lv/ Reported-and-tested-by: Alexey Avramov <hakavlad@inbox.lv> Reported-and-tested-by: Mike Galbraith <efault@gmx.de> Reported-and-tested-by: Darrick J. Wong <djwong@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Hugh Dickins <hughd@google.com> Tracked-by: Thorsten Leemhuis <regressions@leemhuis.info> Fixes: `69392a403f` ("mm/vmscan: throttle reclaim when no progress is being made") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-12-31 11:17:07 -08:00
Linus Torvalds	f87bcc88f3	Merge branch 'akpm' (patches from Andrew) Merge misc mm fixes from Andrew Morton: "2 patches. Subsystems affected by this patch series: mm (userfaultfd and damon)" * akpm: mm/damon/dbgfs: fix 'struct pid' leaks in 'dbgfs_target_ids_write()' userfaultfd/selftests: fix hugetlb area allocations	2021-12-31 09:28:48 -08:00
Linus Torvalds	e46227bf38	SCSI fixes on 20211231 Three fixes, all in drivers. The lpfc one doesn't look exploitable, but nasty things could happen in string operations if mybuf ends up with an on stack unterminated string. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYc8YLiYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishdUhAQCVmqLx GhEK15Y8etJwMoj03I6hO5gChhQz6kk7pxXAVwD/e5LHrVVeq/WxjUnyrC1gx6sm iYHYbZ0UHotwbRpwU9k= =WAIf -----END PGP SIGNATURE----- Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Three fixes, all in drivers. The lpfc one doesn't look exploitable, but nasty things could happen in string operations if mybuf ends up with an on stack unterminated string" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: vmw_pvscsi: Set residual data length conditionally scsi: libiscsi: Fix UAF in iscsi_conn_get_param()/iscsi_conn_teardown() scsi: lpfc: Terminate string in lpfc_debugfs_nvmeio_trc_write()	2021-12-31 09:22:25 -08:00
SeongJae Park	ebb3f994dd	mm/damon/dbgfs: fix 'struct pid' leaks in 'dbgfs_target_ids_write()' DAMON debugfs interface increases the reference counts of 'struct pid's for targets from the 'target_ids' file write callback ('dbgfs_target_ids_write()'), but decreases the counts only in DAMON monitoring termination callback ('dbgfs_before_terminate()'). Therefore, when 'target_ids' file is repeatedly written without DAMON monitoring start/termination, the reference count is not decreased and therefore memory for the 'struct pid' cannot be freed. This commit fixes this issue by decreasing the reference counts when 'target_ids' is written. Link: https://lkml.kernel.org/r/20211229124029.23348-1-sj@kernel.org Fixes: `4bc05954d0` ("mm/damon: implement a debugfs-based user space interface") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [5.15+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-12-31 09:20:12 -08:00
Mike Kravetz	f5c7329718	userfaultfd/selftests: fix hugetlb area allocations Currently, userfaultfd selftest for hugetlb as run from run_vmtests.sh or any environment where there are 'just enough' hugetlb pages will always fail with: testing events (fork, remap, remove): ERROR: UFFDIO_COPY error: -12 (errno=12, line=616) The ENOMEM error code implies there are not enough hugetlb pages. However, there are free hugetlb pages but they are all reserved. There is a basic problem with the way the test allocates hugetlb pages which has existed since the test was originally written. Due to the way 'cleanup' was done between different phases of the test, this issue was masked until recently. The issue was uncovered by commit `8ba6e86408` ("userfaultfd/selftests: reinitialize test context in each test"). For the hugetlb test, src and dst areas are allocated as PRIVATE mappings of a hugetlb file. This means that at mmap time, pages are reserved for the src and dst areas. At the start of event testing (and other tests) the src area is populated which results in allocation of huge pages to fill the area and consumption of reserves associated with the area. Then, a child is forked to fault in the dst area. Note that the dst area was allocated in the parent and hence the parent owns the reserves associated with the mapping. The child has normal access to the dst area, but can not use the reserves created/owned by the parent. Thus, if there are no other huge pages available allocation of a page for the dst by the child will fail. Fix by not creating reserves for the dst area. In this way the child can use free (non-reserved) pages. Also, MAP_PRIVATE of a file only makes sense if you are interested in the contents of the file before making a COW copy. The test does not do this. So, just use MAP_ANONYMOUS \| MAP_HUGETLB to create an anonymous hugetlb mapping. There is no need to create a hugetlb file in the non-shared case. Link: https://lkml.kernel.org/r/20211217172919.7861-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-12-31 09:20:12 -08:00
David S. Miller	e63a023489	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-12-30 The following pull-request contains BPF updates for your net-next tree. We've added 72 non-merge commits during the last 20 day(s) which contain a total of 223 files changed, 3510 insertions(+), 1591 deletions(-). The main changes are: 1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii. 2) Beautify and de-verbose verifier logs, from Christy. 3) Composable verifier types, from Hao. 4) bpf_strncmp helper, from Hou. 5) bpf.h header dependency cleanup, from Jakub. 6) get_func_[arg\|ret\|arg_cnt] helpers, from Jiri. 7) Sleepable local storage, from KP. 8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:35:40 +00:00
David S. Miller	4760abaac6	Merge branch 'mpr-len-checks' David Ahern says: ==================== net: Length checks for attributes within multipath routes Add length checks for attributes within a multipath route (attributes within RTA_MULTIPATH). Motivated by the syzbot report in patch 1 and then expanded to other attributes as noted by Ido. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:32:00 +00:00
David Ahern	8bda81a4d4	lwtunnel: Validate RTA_ENCAP_TYPE attribute length lwtunnel_valid_encap_type_attr is used to validate encap attributes within a multipath route. Add length validation checking to the type. lwtunnel_valid_encap_type_attr is called converting attributes to fib{6,}_config struct which means it is used before fib_get_nhs, ip6_route_multipath_add, and ip6_route_multipath_del - other locations that use rtnh_ok and then nla_get_u16 on RTA_ENCAP_TYPE attribute. Fixes: `9ed59592e3` ("lwtunnel: fix autoload of lwt modules") Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:31:59 +00:00
David Ahern	1ff15a710a	ipv6: Check attribute length for RTA_GATEWAY when deleting multipath route Make sure RTA_GATEWAY for IPv6 multipath route has enough bytes to hold an IPv6 address. Fixes: `6b9ea5a64e` ("ipv6: fix multipath route replace error recovery") Signed-off-by: David Ahern <dsahern@kernel.org> Cc: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:31:59 +00:00
David Ahern	4619bcf913	ipv6: Check attribute length for RTA_GATEWAY in multipath route Commit referenced in the Fixes tag used nla_memcpy for RTA_GATEWAY as does the current nla_get_in6_addr. nla_memcpy protects against accessing memory greater than what is in the attribute, but there is no check requiring the attribute to have an IPv6 address. Add it. Fixes: `51ebd31815` ("ipv6: add support of equal cost multipath (ECMP)") Signed-off-by: David Ahern <dsahern@kernel.org> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:31:59 +00:00
David Ahern	664b9c4b73	ipv4: Check attribute length for RTA_FLOW in multipath route Make sure RTA_FLOW is at least 4B before using. Fixes: `4e902c5741` ("[IPv4]: FIB configuration using struct fib_config") Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:31:59 +00:00
David Ahern	7a3429bace	ipv4: Check attribute length for RTA_GATEWAY in multipath route syzbot reported uninit-value: ============================================================ BUG: KMSAN: uninit-value in fib_get_nhs+0xac4/0x1f80 net/ipv4/fib_semantics.c:708 fib_get_nhs+0xac4/0x1f80 net/ipv4/fib_semantics.c:708 fib_create_info+0x2411/0x4870 net/ipv4/fib_semantics.c:1453 fib_table_insert+0x45c/0x3a10 net/ipv4/fib_trie.c:1224 inet_rtm_newroute+0x289/0x420 net/ipv4/fib_frontend.c:886 Add helper to validate RTA_GATEWAY length before using the attribute. Fixes: `4e902c5741` ("[IPv4]: FIB configuration using struct fib_config") Reported-by: syzbot+d4b9a2851cc3ce998741@syzkaller.appspotmail.com Signed-off-by: David Ahern <dsahern@kernel.org> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:31:59 +00:00
David S. Miller	ce2b6eb409	mlx5-updates-2021-12-28 mlx5 Software steering, New features and optimizations This patch series brings various SW steering features, optimizations and debug-ability focused improvements. 1) Expose debugfs for dumping the SW steering resources 2) Removing unused fields 3) support for matching on new fields 4) steering optimization for RX/TX-only rules 5) Make Software steering the default steering mechanism when available, applies only to Switchdev mode FDB From Yevgeny Kliteynik and Muhammad Sammar: - Patch 1 fixes an error flow in creating matchers - Patch 2 fix lower case macro prefix "mlx5_" to "MLX5_" - Patch 3 removes unused struct member in mlx5dr_matcher - Patch 4 renames list field in matcher struct to list_node to reflect the fact that is field is for list node that is stored on another struct's lists - Patch 5 adds checking for valid Flex parser ID value - Patch 6 adds the missing reserved fields to dr_match_param and aligns it to the format that is defined by HW spec - Patch 7 adds support for dumping SW steering (SMFS) resources using debugfs in CSV format: domain and its tables, matchers and rules - Patch 8 adds support for a new destination type - UPLINK - Patch 9 adds WARN_ON_ONCE on refcount checks in SW steering object destructors - Patches 10, 11, 12 add misc5 flow table match parameters and add support for matching on tunnel headers 0 and 1 - Patch 13 adds support for matching on geneve_tlv_option_0_exist field - Patch 14 implements performance optimization for for empty or RX/TX-only matchers by splitting RX and TX matchers handling: matcher connection in the matchers chain is split into two separate lists (RX only and TX only), which solves a usecase of many RX or TX only rules that create a long chain of RX/TX-only paths w/o the actual rules - Patch 15 ignores modify TTL if device doesn't support it instead of adding and unsupported action - Patch 16 sets SMFS as a default steering mode -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmHOvKsACgkQSD+KveBX +j4CpQf8Cc3NkWhGVYYhRBlntdpyVTDpoVhw6RDXVwboIRZ3GcMm81SxgwKrDuUx Yhup4K1CNKt44D1RRhX4ElSemfo/afxfGIcq7S87vciUOaebWTIZgRyNvuYr/buI v9LIM7zTb1aXL7m3KQHOGc7cucVRvsNjteTxvp/DR0bPFEuzAr5tw9Y5qop9pBbM cfgdmXKUoRxA039mqZ2Fl7y+z51zRkfsCza7lEHcGgpvwOLubFEaghj7YTIT6h1m +We8w3/+B0mqS0HpByIxmyf/EQBG5Hl7FTbFj1Somn7EFEP4E+LaFzg8nGsUHEeI f5027uM7XGlThAUrgdaOMcwBfIuOww== =BE8m -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2021-12-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 Software steering, New features and optimizations This patch series brings various SW steering features, optimizations and debug-ability focused improvements. 1) Expose debugfs for dumping the SW steering resources 2) Removing unused fields 3) support for matching on new fields 4) steering optimization for RX/TX-only rules 5) Make Software steering the default steering mechanism when available, applies only to Switchdev mode FDB From Yevgeny Kliteynik and Muhammad Sammar: - Patch 1 fixes an error flow in creating matchers - Patch 2 fix lower case macro prefix "mlx5_" to "MLX5_" - Patch 3 removes unused struct member in mlx5dr_matcher - Patch 4 renames list field in matcher struct to list_node to reflect the fact that is field is for list node that is stored on another struct's lists - Patch 5 adds checking for valid Flex parser ID value - Patch 6 adds the missing reserved fields to dr_match_param and aligns it to the format that is defined by HW spec - Patch 7 adds support for dumping SW steering (SMFS) resources using debugfs in CSV format: domain and its tables, matchers and rules - Patch 8 adds support for a new destination type - UPLINK - Patch 9 adds WARN_ON_ONCE on refcount checks in SW steering object destructors - Patches 10, 11, 12 add misc5 flow table match parameters and add support for matching on tunnel headers 0 and 1 - Patch 13 adds support for matching on geneve_tlv_option_0_exist field - Patch 14 implements performance optimization for for empty or RX/TX-only matchers by splitting RX and TX matchers handling: matcher connection in the matchers chain is split into two separate lists (RX only and TX only), which solves a usecase of many RX or TX only rules that create a long chain of RX/TX-only paths w/o the actual rules - Patch 15 ignores modify TTL if device doesn't support it instead of adding and unsupported action - Patch 16 sets SMFS as a default steering mode ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:29:31 +00:00
David S. Miller	20a9013eba	Merge branch 'hnsd3-next' Guangbin Huang says: ==================== net: hns3: refactor cmdq functions in PF/VF Currently, hns3 PF and VF module have two sets of cmdq APIs to provide cmdq message interaction functions. Most of these APIs are the same. The only differences are the function variables and names with pf and vf suffixes. These two sets of cmdq APIs are redundent and add extra bug fix work. This series refactor the cmdq APIs in hns3 PF and VF by implementing one set of common cmdq APIs for PF and VF reuse and deleting the old APIs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:25:47 +00:00
Jie Wang	aab8d1c6a5	net: hns3: delete the hclge_cmd.c and hclgevf_cmd.c currently most cmdq APIs are unified in hclge_comm_cmd.c. Newly developed cmdq APIs should also be placed in hclge_comm_cmd.c. So there is no need to keep hclge_cmd.c and hclgevf_cmd.c. This patch moves the hclge(vf)_cmd_send to hclge(vf)_main.c and deletes the source files and makefile scripts. Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:25:47 +00:00
Jie Wang	cb413bfa6e	net: hns3: refactor VF cmdq init and uninit APIs with new common APIs This patch uses common cmdq init and uninit APIs to replace the old APIs in VF cmdq module init and uninit module. Then the old VF init and uninit APIs is deleted. Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:25:47 +00:00
Jie Wang	8e2288cad6	net: hns3: refactor PF cmdq init and uninit APIs with new common APIs This patch uses common cmdq init and uninit APIs to replace the old APIs in PF cmdq module init and uninit modules. Then the old PF init and uninit APIs is deleted. Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:25:47 +00:00
Jie Wang	0b04224c13	net: hns3: create common cmdq init and uninit APIs The PF and VF cmdq init and uninit APIs are also almost same espect the suffixes of API names. This patch creates common cmdq init and uninit APIs needed by PF and VF cmdq modules. The next patch will use the new unified APIs to replace init and uninit APIs in PF module. Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:25:47 +00:00
Jie Wang	745f0a19ee	net: hns3: refactor VF cmdq resource APIs with new common APIs This patch uses common cmdq resource allocate/free/query APIs to replace the old APIs in VF cmdq module and deletes the old cmdq resource APIs. Still we kept hclgevf_cmd_setup_basic_desc name as a seam API to avoid too many meaningless replacement. Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:25:47 +00:00
Jie Wang	d3c69a8812	net: hns3: refactor PF cmdq resource APIs with new common APIs This patch uses common cmdq resource allocate/free/query APIs to replace the old APIs in PF cmdq module and deletes the old cmdq resource APIs. Still we kept hclge_cmd_setup_basic_desc name as a seam API to avoid too many meaningless replacement. Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:25:47 +00:00
Jie Wang	da77aef9cc	net: hns3: create common cmdq resource allocate/free/query APIs The PF and VF cmdq module resource allocate/free/query APIs are almost the same espect the suffixes of API names. These same implementations bring double development and bugfix work. This patch creates common cmdq resource allocate/free/query APIs called by PF and VF cmdq init/uninit APIs. The next patch will use the new unified APIs to replace init/uninit APIs. Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:25:46 +00:00
Jie Wang	076bb53757	net: hns3: refactor hclgevf_cmd_send with new hclge_comm_cmd_send API This patch firstly uses new hardware description struct hclge_comm_hw as child member of hclgevf_hw and deletes the old hardware description child members. All the hclgevf_hw variables used in VF module is modified according to the new hclgevf_hw. Secondly hclgevf_cmd_send is refactored to use hclge_comm_cmd_send APIs. The old functions called by hclgevf_cmd_send are all deleted. Still we kept hclgevf_cmd_send to avoid too many meaningless modifications. Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:25:46 +00:00
Jie Wang	eaa5607db3	net: hns3: refactor hclge_cmd_send with new hclge_comm_cmd_send API This patch firstly uses new hardware description struct hclge_comm_hw as child member of hclge_hw and deletes the original child memebers of hclge_hw. All the hclge_hw variables used in PF module is modified according to the new hclge_hw. Secondly hclge_cmd_send is refactored to use hclge_comm_cmd_send APIs. The old functions called by hclge_cmd_send are deleted and hclge_cmd_send is kept to avoid too many meaningless modifications. Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:25:46 +00:00
Jie Wang	8d307f8e8c	net: hns3: create new set of unified hclge_comm_cmd_send APIs This patch create new set of unified hclge_comm_cmd_send APIs for PF and VF cmdq module. Subfunctions called by hclge_comm_cmd_send are also created include cmdq result check, cmdq return code conversion and ring space opertaion APIs. These new common cmdq APIs will be used to replace the old PF and VF cmdq APIs in next patches. Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:25:46 +00:00
Jie Wang	6befad603d	net: hns3: use struct hclge_desc to replace hclgevf_desc in VF cmdq module This patch use new common struct hclge_desc to replace struct hclgevf_desc in VF cmdq module and then delete the old struct hclgevf_desc. Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:25:46 +00:00
Jie Wang	0a7b6d2218	net: hns3: create new cmdq hardware description structure hclge_comm_hw Currently PF and VF cmdq APIs use struct hclge(vf)_hw to describe cmdq hardware information needed by hclge(vf)_cmd_send. There are a little differences between its child struct hclge_cmq_ring and hclgevf_cmq_ring. It is redundent to use two sets of structures to support same functions. So this patch creates new set of common cmdq hardware description structures(hclge_comm_hw) to unify PF and VF cmdq functions. The struct hclge_desc is still kept to avoid too many meaningless replacement. These new structures will be used to unify hclge(vf)_hw structures in PF and VF cmdq APIs in next patches. Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:25:46 +00:00
Jie Wang	5f20be4e90	net: hns3: refactor hns3 makefile to support hns3_common module Currently we plan to refactor PF and VF cmdq module. A new file folder hns3_common will be created to store new common APIs used by PF and VF cmdq module. Thus the PF and VF compilation process will both depends on the hns3_common. This may cause parallel building problems if we add a new makefile building unit. So this patch combined the PF and VF makefile scripts to the top level makefile to support the new hns3_common which will be created in the next patch. Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:25:46 +00:00
Deep Majumder	c116fe1e18	Docs: Fixes link to I2C specification The link to the I2C specification is broken. Although "https://www.nxp.com" hosts Rev 7 (2021) of this specification, it is behind a login-wall. Thus, an additional link has been added (which doesn't require a login) and the NXP official docs link has been updated. Signed-off-by: Deep Majumder <deep@fastmail.in> [wsa: minor updates to text and commit message] Signed-off-by: Wolfram Sang <wsa@kernel.org>	2021-12-31 14:39:28 +01:00
Pavel Skripkin	bb436283e2	i2c: validate user data in compat ioctl Wrong user data may cause warning in i2c_transfer(), ex: zero msgs. Userspace should not be able to trigger warnings, so this patch adds validation checks for user data in compact ioctl to prevent reported warnings Reported-and-tested-by: syzbot+e417648b303855b91d8a@syzkaller.appspotmail.com Fixes: `7d5cb45655` ("i2c compat ioctls: move to ->compat_ioctl()") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Wolfram Sang <wsa@kernel.org>	2021-12-31 14:28:22 +01:00
Yevgeny Kliteynik	aa36c94853	net/mlx5: Set SMFS as a default steering mode if device supports it Set SMFS (SW-managed flow steering) as a default steering mode instead of DMFS (device-managed flow steering) In SMFS, the driver writes the STEs (Steering Table Entries) directly to the device's ICM, which allows for a higher rule insertion rate than through using FW command interface, as it is done in DMFS. SMFS/DMFS steering modes can be configured through devlink param 'flow_steering_mode'. The possible values are 'smfs' or 'dmfs'. The desired 'flow_steering_mode' param value should be set before enabling switchdev mode. Example: # devlink dev param set pci/0000:05:00.0 name flow_steering_mode smfs # devlink dev eswitch set pci/0000:05:00.0 mode switchdev Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>	2021-12-31 00:17:44 -08:00
Yevgeny Kliteynik	4ff725e1d4	net/mlx5: DR, Ignore modify TTL if device doesn't support it When modifying TTL, packet's csum has to be recalculated. Due to HW issue in ConnectX-5, csum recalculation for modify TTL is supported through a work-around that is specifically enabled by configuration. If the work-around isn't enabled, ignore the modify TTL action rather than adding an unsupported action. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>	2021-12-31 00:17:41 -08:00
Yevgeny Kliteynik	cc2295cd54	net/mlx5: DR, Improve steering for empty or RX/TX-only matchers Every matcher has RX and TX paths. When a new matcher is created, its RX and TX start/end anchors are connected to the respective RX and TX anchors of the previous and next matchers. This creates a potential performance issue: when a certain rule is added to a matcher, in many cases it is RX or TX only rule, which may create a long chain of RX/TX-only paths w/o the actual rules. This patch aims to handle this issue. RX and TX matchers are now handled separately: matcher connection in the matchers chain is split into two separate lists: RX only and TX only. when a new matcher is created, it is initially created 'detached' - its RX/TX members are not inserted into the table's matcher list. When an actual rule is added, only its appropriate RX or TX nic matchers are then added to the table's nic matchers list and inserted into its place in the chain of matchers. I.e., if the rule that is being added is an RX-only rule, only the RX part of the matcher will be connected to the chain, while TX part of the matcher remains detached and doesn't prolong the TX chain of the matchers. Same goes for rule deletion: when the last RX/TX rule of the nic matcher is destroyed, the nic matcher is removed from its list. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>	2021-12-31 00:17:37 -08:00
Yevgeny Kliteynik	f59464e257	net/mlx5: DR, Add support for matching on geneve_tlv_option_0_exist field Match on geneve_tlv_option_0_exist field on devices that support STEv1. Signed-off-by: Muhammad Sammar <muhammads@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>	2021-12-31 00:17:34 -08:00
Muhammad Sammar	09753babaf	net/mlx5: DR, Support matching on tunnel headers 0 and 1 Tunnel headers are generic encapsulation headers, applies for all tunneling protocols identified by the device native parser or by the programmable parser, this support will enable raw matching headers 0 and 1. Signed-off-by: Muhammad Sammar <muhammads@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>	2021-12-31 00:17:30 -08:00
Muhammad Sammar	8c2b4fee9c	net/mlx5: DR, Add misc5 to match_param structs Add misc5 match params to enable matching tunnel headers. Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>	2021-12-31 00:17:27 -08:00
Muhammad Sammar	0f2a6c3b92	net/mlx5: Add misc5 flow table match parameters Add support for misc5 match parameter as per HW spec, this will allow matching on tunnel_header fields. Signed-off-by: Muhammad Sammar <muhammads@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>	2021-12-31 00:17:23 -08:00
Yevgeny Kliteynik	b54128275e	net/mlx5: DR, Warn on failure to destroy objects due to refcount Add WARN_ON_ONCE on refcount checks in SW steering object destructors Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>	2021-12-31 00:17:20 -08:00
Yevgeny Kliteynik	e3a0f40b2f	net/mlx5: DR, Add support for UPLINK destination type Add support for a new destination type - UPLINK. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>	2021-12-31 00:17:17 -08:00
Muhammad Sammar	9222f0b27d	net/mlx5: DR, Add support for dumping steering info Extend mlx5 debugfs support to present Software Steering resources: dr_domain including it's tables, matchers and rules. The interface is read-only. While dump is being presented, new steering rules cannot be inserted/deleted. The steering information is dumped in the CSV form with the following format: <object_type>,<object_ID>, <object_info>,...,<object_info> This data can be read at the following path: /sys/kernel/debug/mlx5/<BDF>/steering/fdb/<domain_handle> Example: # cat /sys/kernel/debug/mlx5/0000:82:00.0/steering/fdb/dmn_000018644 3100,0x55caa4621c50,0xee802,4,65533 3101,0x55caa4621c50,0xe0100008 Changes in V2: - Reduce temp hex buffer size and avoid unnecessary memset - Use bin2hex() instead of DIY loop - Don't check debugfs functions return values Signed-off-by: Muhammad Sammar <muhammads@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>	2021-12-31 00:17:13 -08:00

... 4 5 6 7 8 ...

1062307 Commits All Branches Search

1062307 Commits

All Branches