OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Paolo Abeni	71b7dec27f	mptcp: less aggressive retransmission strategy The current mptcp re-inject strategy is very aggressive, we have mptcp-level retransmissions even on single subflow connection, if the link in-use is lossy. Let's be a little more conservative: we do retransmit only if at least a subflow has write and rtx queue empty. Additionally use the backup subflows only if the active subflows are stale - no progresses in at least an rtx period and ignore stale subflows for rtx timeout update Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/207 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-14 11:37:25 +01:00
Paolo Abeni	33d41c9cd7	mptcp: more accurate timeout As reported by Maxim, we have a lot of MPTCP-level retransmissions when multilple links with different latencies are in use. This patch refactor the mptcp-level timeout accounting so that the maximum of all the active subflow timeout is used. To avoid traversing the subflow list multiple times, the update is performed inside the packet scheduler. Additionally clean-up a bit timeout handling. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-14 11:37:25 +01:00
Linus Torvalds	dfa377c35d	Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "7 patches. Subsystems affected by this patch series: mm (kasan, mm/slub, mm/madvise, and memcg), and lib" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: lib: use PFN_PHYS() in devmem_is_allowed() mm/memcg: fix incorrect flushing of lruvec data in obj_stock mm/madvise: report SIGBUS as -EFAULT for MADV_POPULATE_(READ\|WRITE) mm: slub: fix slub_debug disabling for list of slabs slub: fix kmalloc_pagealloc_invalid_free unit test kasan, slub: reset tag when printing address kasan, kmemleak: reset tags when scanning block	2021-08-13 15:05:23 -10:00
Arnd Bergmann	e5f3155267	ethernet: fix PTP_1588_CLOCK dependencies The 'imply' keyword does not do what most people think it does, it only politely asks Kconfig to turn on another symbol, but does not prevent it from being disabled manually or built as a loadable module when the user is built-in. In the ICE driver, the latter now causes a link failure: aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_eth_ioctl': ice_main.c:(.text+0x13b0): undefined reference to `ice_ptp_get_ts_config' ice_main.c:(.text+0x13b0): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_get_ts_config' aarch64-linux-ld: ice_main.c:(.text+0x13bc): undefined reference to `ice_ptp_set_ts_config' ice_main.c:(.text+0x13bc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_set_ts_config' aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_prepare_for_reset': ice_main.c:(.text+0x31fc): undefined reference to `ice_ptp_release' ice_main.c:(.text+0x31fc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_release' aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_rebuild': This is a recurring problem in many drivers, and we have discussed it several times befores, without reaching a consensus. I'm providing a link to the previous email thread for reference, which discusses some related problems. To solve the dependency issue better than the 'imply' keyword, introduce a separate Kconfig symbol "CONFIG_PTP_1588_CLOCK_OPTIONAL" that any driver can depend on if it is able to use PTP support when available, but works fine without it. Whenever CONFIG_PTP_1588_CLOCK=m, those drivers are then prevented from being built-in, the same way as with a 'depends on PTP_1588_CLOCK \|\| !PTP_1588_CLOCK' dependency that does the same trick, but that can be rather confusing when you first see it. Since this should cover the dependencies correctly, the IS_REACHABLE() hack in the header is no longer needed now, and can be turned back into a normal IS_ENABLED() check. Any driver that gets the dependency wrong will now cause a link time failure rather than being unable to use PTP support when that is in a loadable module. However, the two recently added ptp_get_vclocks_index() and ptp_convert_timestamp() interfaces are only called from builtin code with ethtool and socket timestamps, so keep the current behavior by stubbing those out completely when PTP is in a loadable module. This should be addressed properly in a follow-up. As Richard suggested, we may want to actually turn PTP support into a 'bool' option later on, preventing it from being a loadable module altogether, which would be one way to solve the problem with the ethtool interface. Fixes: `06c16d89d2` ("ice: register 1588 PTP clock device object for E810 devices") Link: https://lore.kernel.org/netdev/20210804121318.337276-1-arnd@kernel.org/ Link: https://lore.kernel.org/netdev/CAK8P3a06enZOf=XyZ+zcAwBczv41UuCTz+=0FMf2gBz1_cOnZQ@mail.gmail.com/ Link: https://lore.kernel.org/netdev/CAK8P3a3=eOxE-K25754+fB_-i_0BZzf9a9RfPTX3ppSwu9WZXw@mail.gmail.com/ Link: https://lore.kernel.org/netdev/20210726084540.3282344-1-arnd@kernel.org/ Acked-by: Shannon Nelson <snelson@pensando.io> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20210812183509.1362782-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 17:49:05 -07:00
Linus Torvalds	27b2eaa118	4 CIFS/SMB3 Fixes, all for stable, 2 relating to deferred close, 1 for modefromsid mount option -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmEW5MQACgkQiiy9cAdy T1H6zQwAwOyQrMGA3VtsatmhhQo89oHbkUB4q4+tY7xabPNgTRAPT9BpRSFsrW4y njeiaP1T7nmiU1yJsYQD/JwRv005pBO/xa7sMsLb0RSca//kzin4WTTzxom3RUBW hU29PvAxl+AjaRvbo6VpSn6xuHH1BDcZU+YfWtX3c6tE30sdzwjyMu7rEhivNGCf 0ukIZuIaEJrZmCwMZHT8+qE1dBOKEad8I39POG1v+mybQCWJvo4MAUDyYHZYKTJz 6e3JDARI19G8hfq1oVAM/g5gIBRBEISug3jenOMVG/QnBnRsBvyuTIunoq4ba+S6 qp3jv3p24DaUe9FPp5sYyAhuJHo0rzSwGBv/SdikJA+3xb8k2E3rECAUbf4j/NmV /sj0tN/6/Z05/6L4ZgqUjdS2KfLztDvgzTGnv/LsB097Nhb8hVIhsKoqzypmyAcH 5AsjXFETMCclWE7oE8DkzaAOJqKD7MGJ6KXCjbt8JcFb/b6L/QQbRyRB5ggupgVn Ic7gn/iM =RlpY -----END PGP SIGNATURE----- Merge tag '5.14-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: "Four CIFS/SMB3 Fixes, all for stable, two relating to deferred close, and one for the 'modefromsid' mount option (when 'idsfromsid' not specified)" * tag '5.14-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Call close synchronously during unlink/rename/lease break. cifs: Handle race conditions during rename cifs: use the correct max-length for dentry_path_raw() cifs: create sd context must be a multiple of 8	2021-08-13 14:44:32 -10:00
Linus Torvalds	a83ed22577	linux-kselftest-fixes-5.14-rc6 This Kselftest fixes update for Linux 5.14-rc6 consists of a single patch to sgx test to fix Q1 and Q2 calculation. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmEW0ZkACgkQCwJExA0N QxzJ2BAAvybJttpYcfSPPAjtMksWLR96QbMiU9AbgwMrBNePKwdnAwg21uI1qEKT HOEED0LqmPN3dcUMPtLAj/yGBMXu/ByAL3/A1q613JUatauUyT1hZp870vbSac8+ RB+RNGa9Hgw8Xg0tAGgtdCMr5p46lE/6oGeAY179DTsm5Haj0x0Kho87REWaAKL+ xvPtD+SEBNm4yOUlUzUlmeO30g/Zjh0WJRp7biEarhx+z/HS7kE43G88NGJY32Vf MYmAIEf7dOw9Z3ZFyng6K5l6DPM5pep6FTI6GP1EKbFse4YdGYE573qSDfsusDGv 9XKTZny4WIcKUXMR6vCM4vGaruOavhLtJLM7L8/FEOFcGfr+LH4RRxikSG1YSxwv L8wvdzYw3pY3agovaR1zLqTlXIYi281EXhpAorQFA+r3PySCXXkOb+ZhtYUbXUId 3qQj6fuvMv094mR38a5C+yAwFV6V6ojmQjho2XZfkMaoSYFGsMt0L23zHpsnbmb/ frTakZaLkUY2Y9mZ06JaevhP6ebtYBYj6yOwFz3JRqi7rt/ArL7OsfKdEoDNNzQE LwDWOzkvBRTZuZ9CbYTxQpKT0rnSGhZpBig+hBFKGZkqOd4884Z6L5eBB7DQLZ/F rifqdxtxiHZO4Q7WikebvSX7M+LXUJzOIrQML4acxkQjiNVoGYw= =wKY6 -----END PGP SIGNATURE----- Merge tag 'linux-kselftest-fixes-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fix from Shuah Khan: "A single patch to sgx test to fix Q1 and Q2 calculation" * tag 'linux-kselftest-fixes-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/sgx: Fix Q1 and Q2 calculation in sigstruct.c	2021-08-13 14:32:38 -10:00
Maciej Machnikowski	5f77351963	ice: Fix perout start time rounding Internal tests found out that the latest code doesn't bring up 1PPS out as expected. As a result of incorrect define used to round the time up the time was round down to the past second boundary. Fix define used for rounding to properly round up to the next Top of second in ice_ptp_cfg_clkout to fix it. Fixes: `172db5f91d` ("ice: add support for auxiliary input/output pins") Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20210813165018.2196013-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 17:22:53 -07:00
Liang Wang	854f32648b	lib: use PFN_PHYS() in devmem_is_allowed() The physical address may exceed 32 bits on 32-bit systems with more than 32 bits of physcial address. Use PFN_PHYS() in devmem_is_allowed(), or the physical address may overflow and be truncated. We found this bug when mapping a high addresses through devmem tool, when CONFIG_STRICT_DEVMEM is enabled on the ARM with ARM_LPAE and devmem is used to map a high address that is not in the iomem address range, an unexpected error indicating no permission is returned. This bug was initially introduced from v2.6.37, and the function was moved to lib in v5.11. Link: https://lkml.kernel.org/r/20210731025057.78825-1-wangliang101@huawei.com Fixes: `087aaffcdf` ("ARM: implement CONFIG_STRICT_DEVMEM by disabling access to RAM via /dev/mem") Fixes: `527701eda5` ("lib: Add a generic version of devmem_is_allowed()") Signed-off-by: Liang Wang <wangliang101@huawei.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Liang Wang <wangliang101@huawei.com> Cc: Xiaoming Ni <nixiaoming@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: <stable@vger.kernel.org> [2.6.37+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-08-13 14:09:32 -10:00
Waiman Long	7fa0dacbaf	mm/memcg: fix incorrect flushing of lruvec data in obj_stock When mod_objcg_state() is called with a pgdat that is different from that in the obj_stock, the old lruvec data cached in obj_stock are flushed out. Unfortunately, they were flushed to the new pgdat and so the data go to the wrong node. This will screw up the slab data reported in /sys/devices/system/node/node*/meminfo. Fix that by flushing the data to the cached pgdat instead. Link: https://lkml.kernel.org/r/20210802143834.30578-1-longman@redhat.com Fixes: `68ac5b3c8d` ("mm/memcg: cache vmstat data in percpu memcg_stock_pcp") Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Chris Down <chris@chrisdown.name> Cc: Yafang Shao <laoar.shao@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-08-13 14:09:32 -10:00
David Hildenbrand	eb2faa513c	mm/madvise: report SIGBUS as -EFAULT for MADV_POPULATE_(READ\|WRITE) Doing some extended tests and polishing the man page update for MADV_POPULATE_(READ\|WRITE), I realized that we end up converting also SIGBUS (via -EFAULT) to -EINVAL, making it look like yet another madvise() user error. We want to report only problematic mappings and permission problems that the user could have know as -EINVAL. Let's not convert -EFAULT arising due to SIGBUS (or SIGSEGV) to -EINVAL, but instead indicate -EFAULT to user space. While we could also convert it to -ENOMEM, using -EFAULT looks more helpful when user space might want to troubleshoot what's going wrong: MADV_POPULATE_(READ\|WRITE) is not part of an final Linux release and we can still adjust the behavior. Link: https://lkml.kernel.org/r/20210726154932.102880-1-david@redhat.com Fixes: `4ca9b3859d` ("mm/madvise: introduce MADV_POPULATE_(READ\|WRITE) to prefault page tables") Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@surriel.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rolf Eike Beer <eike-kernel@sf-tec.de> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-08-13 14:09:31 -10:00
Vlastimil Babka	a7f1d48585	mm: slub: fix slub_debug disabling for list of slabs Vijayanand Jitta reports: Consider the scenario where CONFIG_SLUB_DEBUG_ON is set and we would want to disable slub_debug for few slabs. Using boot parameter with slub_debug=-,slab_name syntax doesn't work as expected i.e; only disabling debugging for the specified list of slabs. Instead it disables debugging for all slabs, which is wrong. This patch fixes it by delaying the moment when the global slub_debug flags variable is updated. In case a "slub_debug=-,slab_name" has been passed, the global flags remain as initialized (depending on CONFIG_SLUB_DEBUG_ON enabled or disabled) and are not simply reset to 0. Link: https://lkml.kernel.org/r/8a3d992a-473a-467b-28a0-4ad2ff60ab82@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Vijayanand Jitta <vjitta@codeaurora.org> Reviewed-by: Vijayanand Jitta <vjitta@codeaurora.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-08-13 14:09:31 -10:00
Shakeel Butt	1ed7ce574c	slub: fix kmalloc_pagealloc_invalid_free unit test The unit test kmalloc_pagealloc_invalid_free makes sure that for the higher order slub allocation which goes to page allocator, the free is called with the correct address i.e. the virtual address of the head page. Commit `f227f0faf6` ("slub: fix unreclaimable slab stat for bulk free") unified the free code paths for page allocator based slub allocations but instead of using the address passed by the caller, it extracted the address from the page. Thus making the unit test kmalloc_pagealloc_invalid_free moot. So, fix this by using the address passed by the caller. Should we fix this? I think yes because dev expect kasan to catch these type of programming bugs. Link: https://lkml.kernel.org/r/20210802180819.1110165-1-shakeelb@google.com Fixes: `f227f0faf6` ("slub: fix unreclaimable slab stat for bulk free") Signed-off-by: Shakeel Butt <shakeelb@google.com> Reported-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Roman Gushchin <guro@fb.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-08-13 14:09:31 -10:00
Kuan-Ying Lee	340caf178d	kasan, slub: reset tag when printing address The address still includes the tags when it is printed. With hardware tag-based kasan enabled, we will get a false positive KASAN issue when we access metadata. Reset the tag before we access the metadata. Link: https://lkml.kernel.org/r/20210804090957.12393-3-Kuan-Ying.Lee@mediatek.com Fixes: `aa1ef4d7b3` ("kasan, mm: reset tags when accessing metadata") Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Nicholas Tang <nicholas.tang@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-08-13 14:09:31 -10:00
Kuan-Ying Lee	6c7a00b843	kasan, kmemleak: reset tags when scanning block Patch series "kasan, slub: reset tag when printing address", v3. With hardware tag-based kasan enabled, we reset the tag when we access metadata to avoid from false alarm. This patch (of 2): Kmemleak needs to scan kernel memory to check memory leak. With hardware tag-based kasan enabled, when it scans on the invalid slab and dereference, the issue will occur as below. Hardware tag-based KASAN doesn't use compiler instrumentation, we can not use kasan_disable_current() to ignore tag check. Based on the below report, there are 11 0xf7 granules, which amounts to 176 bytes, and the object is allocated from the kmalloc-256 cache. So when kmemleak accesses the last 256-176 bytes, it causes faults, as those are marked with KASAN_KMALLOC_REDZONE == KASAN_TAG_INVALID == 0xfe. Thus, we reset tags before accessing metadata to avoid from false positives. BUG: KASAN: out-of-bounds in scan_block+0x58/0x170 Read at addr f7ff0000c0074eb0 by task kmemleak/138 Pointer tag: [f7], memory tag: [fe] CPU: 7 PID: 138 Comm: kmemleak Not tainted 5.14.0-rc2-00001-g8cae8cd89f05-dirty #134 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x1b0 show_stack+0x1c/0x30 dump_stack_lvl+0x68/0x84 print_address_description+0x7c/0x2b4 kasan_report+0x138/0x38c __do_kernel_fault+0x190/0x1c4 do_tag_check_fault+0x78/0x90 do_mem_abort+0x44/0xb4 el1_abort+0x40/0x60 el1h_64_sync_handler+0xb4/0xd0 el1h_64_sync+0x78/0x7c scan_block+0x58/0x170 scan_gray_list+0xdc/0x1a0 kmemleak_scan+0x2ac/0x560 kmemleak_scan_thread+0xb0/0xe0 kthread+0x154/0x160 ret_from_fork+0x10/0x18 Allocated by task 0: kasan_save_stack+0x2c/0x60 __kasan_kmalloc+0xec/0x104 __kmalloc+0x224/0x3c4 __register_sysctl_paths+0x200/0x290 register_sysctl_table+0x2c/0x40 sysctl_init+0x20/0x34 proc_sys_init+0x3c/0x48 proc_root_init+0x80/0x9c start_kernel+0x648/0x6a4 __primary_switched+0xc0/0xc8 Freed by task 0: kasan_save_stack+0x2c/0x60 kasan_set_track+0x2c/0x40 kasan_set_free_info+0x44/0x54 ____kasan_slab_free.constprop.0+0x150/0x1b0 __kasan_slab_free+0x14/0x20 slab_free_freelist_hook+0xa4/0x1fc kfree+0x1e8/0x30c put_fs_context+0x124/0x220 vfs_kern_mount.part.0+0x60/0xd4 kern_mount+0x24/0x4c bdev_cache_init+0x70/0x9c vfs_caches_init+0xdc/0xf4 start_kernel+0x638/0x6a4 __primary_switched+0xc0/0xc8 The buggy address belongs to the object at ffff0000c0074e00 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 176 bytes inside of 256-byte region [ffff0000c0074e00, ffff0000c0074f00) The buggy address belongs to the page: page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x100074 head:(____ptrval____) order:2 compound_mapcount:0 compound_pincount:0 flags: 0xbfffc0000010200(slab\|head\|node=0\|zone=2\|lastcpupid=0xffff\|kasantag=0x0) raw: 0bfffc0000010200 0000000000000000 dead000000000122 f5ff0000c0002300 raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff0000c0074c00: f0 f0 f0 f0 f0 f0 f0 f0 f0 fe fe fe fe fe fe fe ffff0000c0074d00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe >ffff0000c0074e00: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 fe fe fe fe fe ^ ffff0000c0074f00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe ffff0000c0075000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Disabling lock debugging due to kernel taint kmemleak: 181 new suspected memory leaks (see /sys/kernel/debug/kmemleak) Link: https://lkml.kernel.org/r/20210804090957.12393-1-Kuan-Ying.Lee@mediatek.com Link: https://lkml.kernel.org/r/20210804090957.12393-2-Kuan-Ying.Lee@mediatek.com Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Marco Elver <elver@google.com> Cc: Nicholas Tang <nicholas.tang@mediatek.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-08-13 14:09:31 -10:00
Ivan Bornyakov	b697d9d38a	net: phy: marvell: add SFP support for 88E1510 Add support for SFP cages connected to the Marvell 88E1512 transceiver. 88E1512 supports for SGMII/1000Base-X/100Base-FX media type with RGMII on system interface. Configure PHY to appropriate mode depending on the type of SFP inserted. On SFP removal configure PHY to the RGMII-copper mode so RJ-45 port can still work. Signed-off-by: Ivan Bornyakov <i.bornyakov@metrotek.ru> Link: https://lore.kernel.org/r/20210812134256.2436-1-i.bornyakov@metrotek.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 17:07:15 -07:00
Linus Torvalds	020efdadd8	block-5.14-2021-08-13 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmEWwSAQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgprd0D/90ziZdNQdPtU+bTYX9mRnLb6zEB+qyCvFP w0JFk17WoLcFwUm3gTaBPhztWjh9v1O9iInpl+QkzffvBASv3ysjOx0ioHsYjpRq CTupoH8dPyor8cWagTsX9i4ZoeSlo5x49uUZPiEskVq3ioy0HF5SbASzQrTs/SIZ 6uwI/JTkF/xOHPyvpOk+U7QffcC10mcflfwX3vHFL4FM5DWxWhNWy0Y/FSWfQlWz HviWwGjX1uqsoggFIfgUXy32E3oJM6FNVSNcP60dF+wpPtQ4ufz6hRf/epwKjOKm B8rQotlEhD9EHY37u8aCkdnDwK+ILRnl4VKw4zWYSsjwkrpfvAd78XaPTYUYoMEb IulTtSokENK1BNw4XLTyh9KzrxU90Z9lip6Khv4s+cg3Xs3MUC75M8TO/UH1mx4H Fsg7c86bZSwEcGj9McRhFAg0PRcTWsjsIFmI43WME3w6FkVCxB7lmSR55nmNCTXC ZkaLXBKtaaJfu8ehMyKDz6xK39GKW1GZIlYD3+MMKep4gJb3UB4Oin8XYoeLQquU 28fQfk2zsmdPaJAnBK4s3Jlq5cQZ7I+oMHlQ8cGkNtJmFHs5mpFVdMa4gLt9YpMX 8UZrrweQOEFWgfxDIOlPbYN2vjXaCRYWgBmKPa7QI9awNJHTsOH09YzBhjWBauIb 8RMC1Ur7Mg== =LfC0 -----END PGP SIGNATURE----- Merge tag 'block-5.14-2021-08-13' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "A few fixes for block that should go into 5.14: - Revert the mq-deadline cgroup addition. More work is needed on this front, let's revert it for now and get it right before having it in a released kernel (Tejun) - blk-iocost lockdep fix (Ming) - nbd double completion fix (Xie) - Fix for non-idling when clearing the shared tag flag (Yu)" * tag 'block-5.14-2021-08-13' of git://git.kernel.dk/linux-block: nbd: Aovid double completion of a request blk-mq: clear active_queues before clearing BLK_MQ_F_TAG_QUEUE_SHARED Revert "block/mq-deadline: Add cgroup support" blk-iocost: fix lockdep warning on blkcg->lock	2021-08-13 13:36:42 -10:00
Jakub Kicinski	a44fc4b6af	Merge branch 'kconfig-symbol-clean-up-on-net' Lukas Bulwahn says: ==================== Kconfig symbol clean-up on net The script ./scripts/checkkconfigsymbols.py warns on invalid references to Kconfig symbols (often, minor typos, name confusions or outdated references). This patch series addresses all issues reported by ./scripts/checkkconfigsymbols.py in ./net/ and ./drivers/net/ for Kconfig and Makefile files. Issues in the Kconfig and Makefile files indicate some shortcomings in the overall build definitions, and often are true actionable issues to address. These issues can be identified and filtered by: ./scripts/checkkconfigsymbols.py \ \| grep -E "(drivers/)?net/.*(Kconfig\|Makefile)" -B 1 -A 1 After applying this patch series on linux-next (next-20210811), the command above yields no further issues to address. ==================== Link: https://lore.kernel.org/r/20210812083806.28434-1-lukas.bulwahn@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 16:30:38 -07:00
Lukas Bulwahn	f75d81556a	net: dpaa_eth: remove dead select in menuconfig FSL_DPAA_ETH The menuconfig FSL_DPAA_ETH selects config FSL_FMAN_MAC, but the config FSL_FMAN_MAC never existed in the kernel tree. Hence, ./scripts/checkkconfigsymbols.py warns: FSL_FMAN_MAC Referencing files: drivers/net/ethernet/freescale/dpaa/Kconfig Remove this dead select in menuconfig FSL_DPAA_ETH. Fixes: `9ad1a37493` ("dpaa_eth: add support for DPAA Ethernet") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 16:30:35 -07:00
Lukas Bulwahn	d8d9ba8dc9	net: 802: remove dead leftover after ipx driver removal Commit `7a2e838d28` ("staging: ipx: delete it from the tree") removes the ipx driver and the config IPX. Since then, there is some dead leftover in ./net/802/, that was once used by the IPX driver, but has no other user. Remove this dead leftover. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 16:30:35 -07:00
Lukas Bulwahn	4fb464db9c	net: Kconfig: remove obsolete reference to config MICROBLAZE_64K_PAGES Commit `05cdf45747` ("microblaze: Remove noMMU code") removes config MICROBLAZE_64K_PAGES in arch/microblaze/Kconfig. However, there is still a reference to MICROBLAZE_64K_PAGES in the config VMXNET3 in ./drivers/net/Kconfig. Remove this obsolete reference to config MICROBLAZE_64K_PAGES. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 16:30:35 -07:00
Linus Torvalds	42995cee61	io_uring-5.14-2021-08-13 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmEWwP4QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpgqrEAC93z9w25k1mqVZLjMJgp1IKT3KsB1xfn+i QwLz1OVpveyf5fKzZEK6xPn6DhX22FmGca0FhPNYANlNJNgp8X1cT0REf+hdDbYJ IqF14Ue9DO5mbBiutb9aEYoTSZjg1HaZFPRfu71QkFZbTuVx8CoMwDsl1OXG7bcO 5G67khXZFkChmaJVgaANfR4EKKqHXrVt+EOoOyBYdSZLWloWu2KuODXt/FtmllHc kf7bPTw0Za5f6oSJXX52B+RlMyK3HrA9FZZor8alyUd0GVsgu5vwY2vFNSyft8t2 RazK/oqU+hBmV4wNVrAYAByYiq+j3lVLvE3AXPI81cXLyPyCVG0GqqRcycuUaMjV WikJyo6dGQh70yOXw3BhQojPsEL1OPcyMxIi+3g/TKL9kIYOBaO7Es+KwoEKD4fX fAkucTsb/Tc3WsYI3ELOedy9WuIMzAgysZxcTUxphKmRtwFNXLOYmQJEW5dK6MxZ sZfve0JCQsFdebHOjIIanPRJSwksTt0Xfdm1g+R8sDBfnLV81PbGY+8lrRgUiBJ9 DYDd0hRPFnBuWT8h1qToUqKcUHL73IefXstMnstdtWwrL1FZs4JsPGYdZMxqfXSk Z/50aHSmB9KaouFaIrxM4rndrAgP2BJwXOu2f1YbiPTcKNpmdYTX5jT7mfaLSWX5 ZpF6fkDiTA== =lx8b -----END PGP SIGNATURE----- Merge tag 'io_uring-5.14-2021-08-13' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: "A bit bigger than the previous weeks, but mostly just a few stable bound fixes. In detail: - Followup fixes to patches from last week for io-wq, turns out they weren't complete (Hao) - Two lockdep reported fixes out of the RT camp (me) - Sync the io_uring-cp example with liburing, as a few bug fixes never made it to the kernel carried version (me) - SQPOLL related TIF_NOTIFY_SIGNAL fix (Nadav) - Use WRITE_ONCE() when writing sq flags (Nadav) - io_rsrc_put_work() deadlock fix (Pavel)" * tag 'io_uring-5.14-2021-08-13' of git://git.kernel.dk/linux-block: tools/io_uring/io_uring-cp: sync with liburing example io_uring: fix ctx-exit io_rsrc_put_work() deadlock io_uring: drop ctx->uring_lock before flushing work item io-wq: fix IO_WORKER_F_FIXED issue in create_io_worker() io-wq: fix bug of creating io-wokers unconditionally io_uring: rsrc ref lock needs to be IRQ safe io_uring: Use WRITE_ONCE() when writing to sq_flags io_uring: clear TIF_NOTIFY_SIGNAL when running task work	2021-08-13 13:25:08 -10:00
Hari Prasath	593f8c44cc	dt-bindings: net: macb: add documentation for sama5d29 ethernet interface Add documentation for SAMA5D29 ethernet interface. Signed-off-by: Hari Prasath <Hari.PrasathGE@microchip.com> Link: https://lore.kernel.org/r/20210812074422.13487-2-Hari.PrasathGE@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 16:12:32 -07:00
Hari Prasath	7d13ad5011	net: macb: Add PTP support for SAMA5D29 Add PTP capability to the macb config object for sama5d29. Signed-off-by: Hari Prasath <Hari.PrasathGE@microchip.com> Link: https://lore.kernel.org/r/20210812074422.13487-1-Hari.PrasathGE@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 16:12:32 -07:00
Joakim Zhang	b7cdc9658a	net: fec: add WoL support for i.MX8MQ By default FEC driver treat irq[0] (i.e. int0 described in dt-binding) as wakeup interrupt, but this situation changed on i.MX8M serials, SoC integration guys mix wakeup interrupt signal into int2 interrupt line. This patch introduces FEC_QUIRK_WAKEUP_FROM_INT2 to indicate int2 as wakeup interrupt for i.MX8MQ. Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Link: https://lore.kernel.org/r/20210812070948.25797-1-qiangqing.zhang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 16:10:31 -07:00
Geert Uytterhoeven	44e5d08812	ravb: Remove checks for unsupported internal delay modes The EtherAVB instances on the R-Car E3/D3 and RZ/G2E SoCs do not support TX clock internal delay modes, and the EtherAVB driver prints a warning if an unsupported "rgmii-*id" PHY mode is specified, to catch buggy DTBs. Commit `a6f51f2efa` ("ravb: Add support for explicit internal clock delay configuration") deprecated deriving the internal delay mode from the PHY mode, in favor of explicit configuration using the now mandatory "rx-internal-delay-ps" and "tx-internal-delay-ps" properties, thus delegating the warning to the legacy fallback code. Since explicit configuration of a (valid) internal clock delay configuration is enforced by validating device tree source files against DT binding files, and all upstream DTS files have been converted as of commit `a5200e63af` ("arm64: dts: renesas: rzg2: Convert EtherAVB to explicit delay handling"), the checks in the legacy fallback code can be removed. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://lore.kernel.org/r/2037542ac56e99413b9807e24049711553cc88a9.1628696778.git.geert+renesas@glider.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 16:08:02 -07:00
Pavel Skripkin	b06a1ffe17	net: hso: drop unused function argument _hso_serial_set_termios() doesn't use it's second argument, so it can be dropped. Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Link: https://lore.kernel.org/r/20210811171321.18317-1-paskripkin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 16:04:38 -07:00
Linus Torvalds	462938cd48	Pin control fixes for the v5.14 kernel series: - Fix the Kconfig dependency for Qualcomm SM8350 pin controller. - Fix pin biasing fallback behaviour on the Mediatek pin controller. - Fix the GPIO numbering scheme for Intel Tiger Lake-H to correspond to the products that are now actually out on the market. - Fix a pin control function itemization in the Sunxi driver out-of-bounds access bug. - Fix disable clocking for the RISC-V K210 pin controller on the errorpath. - Fix a system shutdown bug affecting AMD Ryzen-based laptops, the system would not suspend but just bounce back up. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAmEWNmEACgkQQRCzN7AZ XXMicA/9EProLMt9OzkFKKtbXQ00cAOBRxTOp9XJA1Ch3Iih8N2ksRYsI4rHtETb BA869OyWKvZtUBoFMBp3TSGU3dx5nUI/pOgnP7pfeUQ6noIltD7lYBC3/sWu0YXg xV4zZQlONYKj+V0qJoYvmSeXzpAXtm4M0d/wQGNa2PJ5Zags/jbC1+MyPJqe7gHr BhmDb2TGdM5R4Qc+dMqUpW2pBvwHyNk8PgVZUqPeBHRlPzkeImBnxvmuJNPXgigD pPMLgcjhxX95Kvwu1H0xrx798UEe+T81wtsxq/HVbircvjjcT0luR3PAPTsgaJjz m58rlPIbIERi0azg5wiChIjdhu9EdXiVFSfMomj61Rk8M6not/XucA+TI2V8GS5B sJ6PW6odLyTYAXJI0zzFzz6nGplSqZglcwHnaPVOBzuOp3Y33mowWi58AfVYQu+x qPt0Enu42bAqa7mtEB3esA1TfVigw2/EU44N7xEM4AObP7m4XGJVYc+3LV6p4SJD 4aXz6Wd/BIzP9SeC7G94JKAPyIuRBNbc1jdWfFtn6mEAe50DiAw2HkW3sH5DwtiK VQhmEbpGzfuN0ndvJRNI669IHy+vJCwCRciq6TbTNGA9OVvkwA8crh1VEicSFn2G 0CopfTWUivvVmB5eu8Ma+DiVO9xeg3b+6Ipzy6UaCIObw0TUXvw= =Dd/A -----END PGP SIGNATURE----- Merge tag 'pinctrl-v5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "An assortment of pin control fixes of varying importance, the most important ones affecting Intel and AMD laptops turned up the recent few days so it's time to push this to your tree. - Fix the Kconfig dependency for Qualcomm SM8350 pin controller - Fix pin biasing fallback behaviour on the Mediatek pin controller - Fix the GPIO numbering scheme for Intel Tiger Lake-H to correspond to the products that are now actually out on the market - Fix a pin control function itemization in the Sunxi driver out-of-bounds access bug - Fix disable clocking for the RISC-V K210 pin controller on the errorpath - Fix a system shutdown bug affecting AMD Ryzen-based laptops, the system would not suspend but just bounce back up" * tag 'pinctrl-v5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: amd: Fix an issue with shutdown when system set to s0ix pinctrl: k210: Fix k210_fpioa_probe() pinctrl: sunxi: Don't underestimate number of functions pinctrl: tigerlake: Fix GPIO mapping for newer version of software pinctrl: mediatek: Fix fallback behavior for bias_set_combo pinctrl: qcom: fix GPIOLIB dependencies	2021-08-13 12:41:45 -10:00
Changbin Du	afa79d08c6	net: in_irq() cleanup Replace the obsolete and ambiguos macro in_irq() with new macro in_hardirq(). Signed-off-by: Changbin Du <changbin.du@gmail.com> Link: https://lore.kernel.org/r/20210813145749.86512-1-changbin.du@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 14:09:19 -07:00
Jussi Maki	39a0876d59	net, bonding: Disallow vlan+srcmac with XDP The new vlan+srcmac xmit policy is not implementable with XDP since in many cases the 802.1Q payload is not present in the packet. This can be for example due to hardware offload or in the case of veth due to use of skbuffs internally. This also fixes the NULL deref with the vlan+srcmac xmit policy reported by Jonathan Toppins by additionally checking the skb pointer. Fixes: `a815bde56b` ("net, bonding: Refactor bond_xmit_hash for use with xdp_buff") Reported-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: Jussi Maki <joamaki@gmail.com> Reviewed-by: Jonathan Toppins <jtoppins@redhat.com> Link: https://lore.kernel.org/r/20210812145241.12449-1-joamaki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 14:03:28 -07:00
Rao Shoaib	876c14ad01	af_unix: fix holding spinlock in oob handling syzkaller found that OOB code was holding spinlock while calling a function in which it could sleep. Reported-by: syzbot+8760ca6c1ee783ac4abd@syzkaller.appspotmail.com Fixes: `314001f0bf` ("af_unix: Add OOB support") Signed-off-by: Rao Shoaib <rao.shoaib@oracle.com> Link: https://lore.kernel.org/r/20210811220652.567434-1-Rao.Shoaib@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 10:31:22 -07:00
Jakub Kicinski	9d5e6a7076	Merge branch 'bnxt-tx-napi-disabling-resiliency-improvements' Jakub Kicinski says: ==================== bnxt: Tx NAPI disabling resiliency improvements A lockdep warning was triggered by netpoll because napi poll was taking the xmit lock. Fix that and a couple more issues noticed while reading the code. ==================== Link: https://lore.kernel.org/r/20210812214242.578039-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 10:26:20 -07:00
Jakub Kicinski	fb9f719009	bnxt: count Tx drops Drivers should count packets they are dropping. Fixes: `c0c050c58d` ("bnxt_en: New Broadcom ethernet driver.") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 10:26:17 -07:00
Jakub Kicinski	e8d8c5d80f	bnxt: make sure xmit_more + errors does not miss doorbells skbs are freed on error and not put on the ring. We may, however, be in a situation where we're freeing the last skb of a batch, and there is a doorbell ring pending because of xmit_more() being true earlier. Make sure we ring the door bell in such situations. Since errors are rare don't pay attention to xmit_more() and just always flush the pending frames. The busy case should be safe to be left alone because it can only happen if start_xmit races with completions and they both enable the queue. In that case the kick can't be pending. Noticed while reading the code. Fixes: `4d172f21ce` ("bnxt_en: Implement xmit_more.") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 10:26:17 -07:00
Jakub Kicinski	01cca6b933	bnxt: disable napi before canceling DIM napi schedules DIM, napi has to be disabled first, then DIM canceled. Noticed while reading the code. Fixes: `0bc0b97fca` ("bnxt_en: cleanup DIM work on device shutdown") Fixes: `6a8788f256` ("bnxt_en: add support for software dynamic interrupt moderation") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 10:26:17 -07:00
Jakub Kicinski	3c603136c9	bnxt: don't lock the tx queue from napi poll We can't take the tx lock from the napi poll routine, because netpoll can poll napi at any moment, including with the tx lock already held. The tx lock is protecting against two paths - the disable path, and (as Michael points out) the NETDEV_TX_BUSY case which may occur if NAPI completions race with start_xmit and both decide to re-enable the queue. For the disable/ifdown path use synchronize_net() to make sure closing the device does not race we restarting the queues. Annotate accesses to dev_state against data races. For the NAPI cleanup vs start_xmit path - appropriate barriers are already in place in the main spot where Tx queue is stopped but we need to do the same careful dance in the TX_BUSY case. Fixes: `c0c050c58d` ("bnxt_en: New Broadcom ethernet driver.") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 10:26:17 -07:00
Xie Yongji	cddce01160	nbd: Aovid double completion of a request There is a race between iterating over requests in nbd_clear_que() and completing requests in recv_work(), which can lead to double completion of a request. To fix it, flush the recv worker before iterating over the requests and don't abort the completed request while iterating. Fixes: `96d97e1782` ("nbd: clear_sock on netlink disconnect") Reported-by: Jiang Yadong <jiangyadong@bytedance.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20210813151330.96-1-xieyongji@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-13 09:46:48 -06:00
Ilya Leoshkevich	3776f3517e	selftests, bpf: Test that dead ldx_w insns are accepted Prevent regressions related to zero-extension metadata handling during dead code sanitization. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210812151811.184086-3-iii@linux.ibm.com	2021-08-13 17:46:26 +02:00
Ilya Leoshkevich	45c709f8c7	bpf: Clear zext_dst of dead insns "access skb fields ok" verifier test fails on s390 with the "verifier bug. zext_dst is set, but no reg is defined" message. The first insns of the test prog are ... 0: 61 01 00 00 00 00 00 00 ldxw %r0,[%r1+0] 8: 35 00 00 01 00 00 00 00 jge %r0,0,1 10: 61 01 00 08 00 00 00 00 ldxw %r0,[%r1+8] ... and the 3rd one is dead (this does not look intentional to me, but this is a separate topic). sanitize_dead_code() converts dead insns into "ja -1", but keeps zext_dst. When opt_subreg_zext_lo32_rnd_hi32() tries to parse such an insn, it sees this discrepancy and bails. This problem can be seen only with JITs whose bpf_jit_needs_zext() returns true. Fix by clearning dead insns' zext_dst. The commits that contributed to this problem are: 1. `5aa5bd14c5` ("bpf: add initial suite for selftests"), which introduced the test with the dead code. 2. `5327ed3d44` ("bpf: verifier: mark verified-insn with sub-register zext flag"), which introduced the zext_dst flag. 3. `83a2881903` ("bpf: Account for BPF_FETCH in insn_has_def32()"), which introduced the sanity check. 4. `9183671af6` ("bpf: Fix leakage under speculation on mispredicted branches"), which bisect points to. It's best to fix this on stable branches that contain the second one, since that's the point where the inconsistency was introduced. Fixes: `5327ed3d44` ("bpf: verifier: mark verified-insn with sub-register zext flag") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210812151811.184086-2-iii@linux.ibm.com	2021-08-13 17:43:43 +02:00
Jens Axboe	8f40d03707	tools/io_uring/io_uring-cp: sync with liburing example This example is missing a few fixes that are in the liburing version, synchronize with the upstream version. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-13 08:58:11 -06:00
Yu Kuai	454bb67752	blk-mq: clear active_queues before clearing BLK_MQ_F_TAG_QUEUE_SHARED We run a test that delete and recover devcies frequently(two devices on the same host), and we found that 'active_queues' is super big after a period of time. If device a and device b share a tag set, and a is deleted, then blk_mq_exit_queue() will clear BLK_MQ_F_TAG_QUEUE_SHARED because there is only one queue that are using the tag set. However, if b is still active, the active_queues of b might never be cleared even if b is deleted. Thus clear active_queues before BLK_MQ_F_TAG_QUEUE_SHARED is cleared. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210731062130.1533893-1-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-13 08:01:34 -06:00
Jakub Kicinski	f4083a752a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h `9e26680733` ("bnxt_en: Update firmware call to retrieve TX PTP timestamp") `9e518f2580` ("bnxt_en: 1PPS functions to configure TSIO pins") `099fdeda65` ("bnxt_en: Event handler for PPS events") kernel/bpf/helpers.c include/linux/bpf-cgroup.h `a2baf4e8bb` ("bpf: Fix potentially incorrect results with bpf_get_local_storage()") `c7603cfa04` ("bpf: Add ambient BPF runtime context stored in current") drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c `5957cc557d` ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray") `2d0b41a376` ("net/mlx5: Refcount mlx5_irq with integer") MAINTAINERS `7b637cd52f` ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo") `7d901a1e87` ("net: phy: add Maxlinear GPY115/21x/24x driver") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 06:41:22 -07:00
Thomas Gleixner	7a3dc4f35b	driver core: Add missing kernel doc for device::msi_lock Fixes: `77e89afc25` ("PCI/MSI: Protect msi_desc::masked for multi-MSI") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2021-08-13 12:38:48 +02:00
Paolo Bonzini	6e949ddb0a	Merge branch 'kvm-tdpmmu-fixes' into kvm-master Merge topic branch with fixes for both 5.14-rc6 and 5.15.	2021-08-13 03:33:13 -04:00
Sean Christopherson	ce25681d59	KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock Add yet another spinlock for the TDP MMU and take it when marking indirect shadow pages unsync. When using the TDP MMU and L1 is running L2(s) with nested TDP, KVM may encounter shadow pages for the TDP entries managed by L1 (controlling L2) when handling a TDP MMU page fault. The unsync logic is not thread safe, e.g. the kvm_mmu_page fields are not atomic, and misbehaves when a shadow page is marked unsync via a TDP MMU page fault, which runs with mmu_lock held for read, not write. Lack of a critical section manifests most visibly as an underflow of unsync_children in clear_unsync_child_bit() due to unsync_children being corrupted when multiple CPUs write it without a critical section and without atomic operations. But underflow is the best case scenario. The worst case scenario is that unsync_children prematurely hits '0' and leads to guest memory corruption due to KVM neglecting to properly sync shadow pages. Use an entirely new spinlock even though piggybacking tdp_mmu_pages_lock would functionally be ok. Usurping the lock could degrade performance when building upper level page tables on different vCPUs, especially since the unsync flow could hold the lock for a comparatively long time depending on the number of indirect shadow pages and the depth of the paging tree. For simplicity, take the lock for all MMUs, even though KVM could fairly easily know that mmu_lock is held for write. If mmu_lock is held for write, there cannot be contention for the inner spinlock, and marking shadow pages unsync across multiple vCPUs will be slow enough that bouncing the kvm_arch cacheline should be in the noise. Note, even though L2 could theoretically be given access to its own EPT entries, a nested MMU must hold mmu_lock for write and thus cannot race against a TDP MMU page fault. I.e. the additional spinlock only _needs_ to be taken by the TDP MMU, as opposed to being taken by any MMU for a VM that is running with the TDP MMU enabled. Holding mmu_lock for read also prevents the indirect shadow page from being freed. But as above, keep it simple and always take the lock. Alternative #1, the TDP MMU could simply pass "false" for can_unsync and effectively disable unsync behavior for nested TDP. Write protecting leaf shadow pages is unlikely to noticeably impact traditional L1 VMMs, as such VMMs typically don't modify TDP entries, but the same may not hold true for non-standard use cases and/or VMMs that are migrating physical pages (from L1's perspective). Alternative #2, the unsync logic could be made thread safe. In theory, simply converting all relevant kvm_mmu_page fields to atomics and using atomic bitops for the bitmap would suffice. However, (a) an in-depth audit would be required, (b) the code churn would be substantial, and (c) legacy shadow paging would incur additional atomic operations in performance sensitive paths for no benefit (to legacy shadow paging). Fixes: `a2855afc7e` ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU") Cc: stable@vger.kernel.org Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812181815.3378104-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-13 03:32:14 -04:00
Sean Christopherson	0103098fb4	KVM: x86/mmu: Don't step down in the TDP iterator when zapping all SPTEs Set the min_level for the TDP iterator at the root level when zapping all SPTEs to optimize the iterator's try_step_down(). Zapping a non-leaf SPTE will recursively zap all its children, thus there is no need for the iterator to attempt to step down. This avoids rereading the top-level SPTEs after they are zapped by causing try_step_down() to short-circuit. In most cases, optimizing try_step_down() will be in the noise as the cost of zapping SPTEs completely dominates the overall time. The optimization is however helpful if the zap occurs with relatively few SPTEs, e.g. if KVM is zapping in response to multiple memslot updates when userspace is adding and removing read-only memslots for option ROMs. In that case, the task doing the zapping likely isn't a vCPU thread, but it still holds mmu_lock for read and thus can be a noisy neighbor of sorts. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812181414.3376143-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-13 03:31:56 -04:00
Sean Christopherson	524a1e4e38	KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs Pass "all ones" as the end GFN to signal "zap all" for the TDP MMU and really zap all SPTEs in this case. As is, zap_gfn_range() skips non-leaf SPTEs whose range exceeds the range to be zapped. If shadow_phys_bits is not aligned to the range size of top-level SPTEs, e.g. 512gb with 4-level paging, the "zap all" flows will skip top-level SPTEs whose range extends beyond shadow_phys_bits and leak their SPs when the VM is destroyed. Use the current upper bound (based on host.MAXPHYADDR) to detect that the caller wants to zap all SPTEs, e.g. instead of using the max theoretical gfn, 1 << (52 - 12). The more precise upper bound allows the TDP iterator to terminate its walk earlier when running on hosts with MAXPHYADDR < 52. Add a WARN on kmv->arch.tdp_mmu_pages when the TDP MMU is destroyed to help future debuggers should KVM decide to leak SPTEs again. The bug is most easily reproduced by running (and unloading!) KVM in a VM whose host.MAXPHYADDR < 39, as the SPTE for gfn=0 will be skipped. ============================================================================= BUG kvm_mmu_page_header (Not tainted): Objects remaining in kvm_mmu_page_header on __kmem_cache_shutdown() ----------------------------------------------------------------------------- Slab 0x000000004d8f7af1 objects=22 used=2 fp=0x00000000624d29ac flags=0x4000000000000200(slab\|zone=1) CPU: 0 PID: 1582 Comm: rmmod Not tainted 5.14.0-rc2+ #420 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack_lvl+0x45/0x59 slab_err+0x95/0xc9 __kmem_cache_shutdown.cold+0x3c/0x158 kmem_cache_destroy+0x3d/0xf0 kvm_mmu_module_exit+0xa/0x30 [kvm] kvm_arch_exit+0x5d/0x90 [kvm] kvm_exit+0x78/0x90 [kvm] vmx_exit+0x1a/0x50 [kvm_intel] __x64_sys_delete_module+0x13f/0x220 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: `faaf05b00a` ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU") Cc: stable@vger.kernel.org Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812181414.3376143-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-13 03:31:46 -04:00
Paolo Bonzini	c5e2bf0b4a	KVM/arm64 fixes for 5.14, take #2 - Plug race between enabling MTE and creating vcpus - Fix off-by-one bug when checking whether an address range is RAM -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmEWEsoPHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpD1IIQAIbZdNAIy68j2/H8sgaYT4GuYICLOvz3WhTI Li/yRP2b0th4wT4LaKlATKJKQgliPxXZ0KCJMZxFr7aiKEyY1LZe+ddJBzetzgy2 S12v5V3cp/0DHQ6CEflUy0x8gM/BeudeYyZcHxSbLZcVB4bzFx9pBJeJ1WkLG+GC Bx4zxdARNas+9zOUuHLCQbWfihMSrbj3CI6WIafpNeFOs3lLldT8WcRofgQfAsAx V3FKETIOb5NUU6LKUHkYgyM3n1MZwAukaCsepDhayeeT5iEyIGXb1HkjcYOx6bfn BhDvA7PH9oXBOFFL2sxlJKamXWZP3Bz7xyZ40MXDqC1lSMAUEh8TXJFptncEDxPb OgXewTgCulKVSjT8YXnoTe1UNQ2dLqjw1TsqV5jXhVXIjeBcR8S4gM0hcqwvgWlO BHaDt8BPd39rBzfC0gUkE5BHE04QuboK/Vz/+Qc6Slc3EUIdnuCtjefdRLvSxxgB bEBW+s3zcZ7RhoSLvXgvTe3an11Os8BH921VCxgMyEnIvSDEbw3KypmPYuNCkSLc t9GLAbPU139w7Gk7vp0oqhI8xIV7QoFk+b94JIHMvtS13yVaqBrZF33RrFzmAwVN lXDiOdoR8mqbX2EPQVIn+BhSlebfvnJANm46tzgY1/u2mUgH//fu/cH3kpjgohco kY+Ztnb9 =hL2s -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.14, take #2 - Plug race between enabling MTE and creating vcpus - Fix off-by-one bug when checking whether an address range is RAM	2021-08-13 03:21:13 -04:00
Sean Christopherson	18712c1370	KVM: nVMX: Use vmx_need_pf_intercept() when deciding if L0 wants a #PF Use vmx_need_pf_intercept() when determining if L0 wants to handle a #PF in L2 or if the VM-Exit should be forwarded to L1. The current logic fails to account for the case where #PF is intercepted to handle guest.MAXPHYADDR < host.MAXPHYADDR and ends up reflecting all #PFs into L1. At best, L1 will complain and inject the #PF back into L2. At worst, L1 will eat the unexpected fault and cause L2 to hang on infinite page faults. Note, while the bug was technically introduced by the commit that added support for the MAXPHYADDR madness, the shame is all on commit `a0c134347b` ("KVM: VMX: introduce vmx_need_pf_intercept"). Fixes: `1dbf5d68af` ("KVM: VMX: Add guest physical address check in EPT violation and misconfig") Cc: stable@vger.kernel.org Cc: Peter Shier <pshier@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812045615.3167686-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-13 03:20:58 -04:00
Junaid Shahid	85aa8889b8	kvm: vmx: Sync all matching EPTPs when injecting nested EPT fault When a nested EPT violation/misconfig is injected into the guest, the shadow EPT PTEs associated with that address need to be synced. This is done by kvm_inject_emulated_page_fault() before it calls nested_ept_inject_page_fault(). However, that will only sync the shadow EPT PTE associated with the current L1 EPTP. Since the ASID is based on EP4TA rather than the full EPTP, so syncing the current EPTP is not enough. The SPTEs associated with any other L1 EPTPs in the prev_roots cache with the same EP4TA also need to be synced. Signed-off-by: Junaid Shahid <junaids@google.com> Message-Id: <20210806222229.1645356-1-junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-13 03:20:58 -04:00
Paolo Bonzini	375d1adebc	Merge branch 'kvm-vmx-secctl' into kvm-master Merge common topic branch for 5.14-rc6 and 5.15 merge window.	2021-08-13 03:20:18 -04:00

... 4 5 6 7 8 ...

1032326 Commits All Branches Search

1032326 Commits

All Branches