OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Kairui Song	b7e9b568c2	libfdt: include fdt_addresses.c Upstream: `c273a2bd8a` Link: `887436bdb7` commit `c273a2bd8a` Author: AKASHI Takahiro <takahiro.akashi@linaro.org> Date: Mon Dec 9 12:03:44 2019 +0900 libfdt: include fdt_addresses.c In the implementation of kexec_file_loaded-based kdump for arm64, fdt_appendprop_addrrange() will be needed. So include fdt_addresses.c in making libfdt. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Frank Rowand <frowand.list@gmail.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:51:02 +08:00
Kairui Song	6fb78c4cc6	arm64: kdump: remove dependency on arm64_dma32_phys_limit From: Yi Li <adamliyi@msn.com> Link: `696027f109` The patch `b2da6ad294` (arm64: kdump: reimplement crashkernel=X) depends on commit `1a8e1cef76` ("arm64: use both ZONE_DMA and ZONE_DMA32"). Commit `1a8e1cef76` is not ported to 5.4 kernel. So use arm64_dma_phys_limit. Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:51:01 +08:00
Kairui Song	fd02a1b5bc	kdump: update Documentation about crashkernel From: Chen Zhou <chenzhou10@huawei.com> Link: https://lkml.org/lkml/2021/1/30/53 Link: `023deaec32` For arm64, the behavior of crashkernel=X has been changed, which tries low allocation in DMA zone or DMA32 zone if CONFIG_ZONE_DMA is disabled, and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a high region above DMA zone, which also tries to allocate at least 256M low memory in DMA zone automatically (or the DMA32 zone if CONFIG_ZONE_DMA is disabled). "crashkernel=Y,low" can be used to allocate specified size low memory. So update the Documentation. Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Tested-by: John Donnelly <John.p.donnelly@oracle.com> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:51:01 +08:00
Kairui Song	3fd41ff677	arm64: kdump: add memory for devices by DT property linux,usable-memory-range From: Chen Zhou <chenzhou10@huawei.com> Link: https://lkml.org/lkml/2021/1/30/53 Link: `2012a3b392` When reserving crashkernel in high memory, some low memory is reserved for crash dump kernel devices and never mapped by the first kernel. This memory range is advertised to crash dump kernel via DT property under /chosen, linux,usable-memory-range = <BASE1 SIZE1 [BASE2 SIZE2]> We reused the DT property linux,usable-memory-range and made the low memory region as the second range "BASE2 SIZE2", which keeps compatibility with existing user-space and older kdump kernels. Crash dump kernel reads this property at boot time and call memblock_add() to add the low memory region after memblock_cap_memory_range() has been called. Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Tested-by: John Donnelly <John.p.donnelly@oracle.com> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:51:01 +08:00
Kairui Song	d001dccf2b	x86, arm64: Add ARCH_WANT_RESERVE_CRASH_KERNEL config From: Chen Zhou <chenzhou10@huawei.com> Link: https://lkml.org/lkml/2021/1/30/53 Link: `c8013ee6cd` We make the functions reserve_crashkernel[_low]() as generic for x86 and arm64. Since reserve_crashkernel[_low]() implementations are quite similar on other architectures as well, we can have more users of this later. So have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in arch/Kconfig and select this by X86 and ARM64. Suggested-by: Mike Rapoport <rppt@kernel.org> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:51:00 +08:00
Kairui Song	bd482067c3	arm64: kdump: reimplement crashkernel=X From: Chen Zhou <chenzhou10@huawei.com> Link: https://lkml.org/lkml/2021/1/30/53 Link: `70e586365f` There are following issues in arm64 kdump: 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail when there is no enough low memory. 2. If reserving crashkernel above 4G, in this case, crash dump kernel will boot failure because there is no low memory available for allocation. 3. Since commit `1a8e1cef76` ("arm64: use both ZONE_DMA and ZONE_DMA32"), if the memory reserved for crash dump kernel falled in ZONE_DMA32, the devices in crash dump kernel need to use ZONE_DMA will alloc fail. To solve these issues, change the behavior of crashkernel=X and introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation in DMA zone or DMA32 zone if CONFIG_ZONE_DMA is disabled, and fall back to high allocation if it fails. We can also use "crashkernel=X,high" to select a region above DMA zone, which also tries to allocate at least 256M in DMA zone automatically (or the DMA32 zone if CONFIG_ZONE_DMA is disabled). "crashkernel=Y,low" can be used to allocate specified size low memory. Another minor change, there may be two regions reserved for crash dump kernel, in order to distinct from the high region and make no effect to the use of existing kexec-tools, rename the low region as "Crash kernel (low)". Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Tested-by: John Donnelly <John.p.donnelly@oracle.com> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:51:00 +08:00
Kairui Song	f30355b620	arm64: kdump: introduce some macroes for crash kernel reservation From: Chen Zhou <chenzhou10@huawei.com> Link: https://lkml.org/lkml/2021/1/30/53 Link: `667118f8c1` Introduce macro CRASH_ALIGN for alignment, macro CRASH_ADDR_LOW_MAX for upper bound of low crash memory, macro CRASH_ADDR_HIGH_MAX for upper bound of high crash memory, use macroes instead. Besides, keep consistent with x86, use CRASH_ALIGN as the lower bound of crash kernel reservation. Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Tested-by: John Donnelly <John.p.donnelly@oracle.com> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:51:00 +08:00
Kairui Song	21ff8ff8f3	x86/elf: Move vmcore_elf_check_arch_cross to arch/x86/include/asm/elf.h From: Chen Zhou <chenzhou10@huawei.com> Link: https://lkml.org/lkml/2021/1/30/53 Link: `b332ab8970` Move macro vmcore_elf_check_arch_cross from arch/x86/include/asm/kexec.h to arch/x86/include/asm/elf.h to fix the following compiling warning: In file included from arch/x86/kernel/setup.c:39:0: ./arch/x86/include/asm/kexec.h:77:0: warning: "vmcore_elf_check_arch_cross" redefined # define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64) In file included from arch/x86/kernel/setup.c:9:0: ./include/linux/crash_dump.h:39:0: note: this is the location of the previous definition #define vmcore_elf_check_arch_cross(x) 0 The root cause is that vmcore_elf_check_arch_cross under CONFIG_CRASH_CORE depend on CONFIG_KEXEC_CORE. Commit 532b66d2279d ("x86: kdump: move reserve_crashkernel[_low]() into crash_core.c") triggered the issue. Suggested by Mike, simply move vmcore_elf_check_arch_cross from arch/x86/include/asm/kexec.h to arch/x86/include/asm/elf.h to fix the warning. Fixes: 532b66d2279d ("x86: kdump: move reserve_crashkernel[_low]() into crash_core.c") Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Mike Rapoport <rppt@kernel.org> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:50:59 +08:00
Kairui Song	3177fa46ec	x86: kdump: move reserve_crashkernel[_low]() into crash_core.c From: Chen Zhou <chenzhou10@huawei.com> Link: https://lkml.org/lkml/2021/1/30/53 Link: `8cb8686864` Make the functions reserve_crashkernel[_low]() as generic. Arm64 will use these to reimplement crashkernel=X. Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Tested-by: John Donnelly <John.p.donnelly@oracle.com> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:50:59 +08:00
Kairui Song	e5eac006f6	x86: kdump: move xen_pv_domain() check and insert_resource() to setup_arch() From: Chen Zhou <chenzhou10@huawei.com> Link: https://lkml.org/lkml/2021/1/30/53 Link: `8ec4a816f2` We will make the functions reserve_crashkernel() as generic, the xen_pv_domain() check in reserve_crashkernel() is relevant only to x86, the same as insert_resource() in reserve_crashkernel[_low](). So move xen_pv_domain() check and insert_resource() to setup_arch() to keep them in x86. Suggested-by: Mike Rapoport <rppt@kernel.org> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Tested-by: John Donnelly <John.p.donnelly@oracle.com> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:50:59 +08:00
Kairui Song	2fbb10e99b	x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel() From: Chen Zhou <chenzhou10@huawei.com> Link: https://lkml.org/lkml/2021/1/30/53 Link: `a2e0b4351d` To make the functions reserve_crashkernel() as generic, replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX. Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Tested-by: John Donnelly <John.p.donnelly@oracle.com> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:50:58 +08:00
Kairui Song	cc6803d7d8	x86: kdump: make the lower bound of crash kernel reservation consistent From: Chen Zhou <chenzhou10@huawei.com> Link: https://lkml.org/lkml/2021/1/30/53 Link: `8882ba540e` The lower bounds of crash kernel reservation and crash kernel low reservation are different, use the consistent value CRASH_ALIGN. Suggested-by: Dave Young <dyoung@redhat.com> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Tested-by: John Donnelly <John.p.donnelly@oracle.com> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:50:58 +08:00
Kairui Song	b3ab0276fe	x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN From: Chen Zhou <chenzhou10@huawei.com> Link: https://lkml.org/lkml/2021/1/30/53 Link: `873384fe79` Move CRASH_ALIGN to header asm/kexec.h for later use. Besides, the alignment of crash kernel regions in x86 is 16M(CRASH_ALIGN), but function reserve_crashkernel() also used 1M alignment. So just replace hard-coded alignment 1M with macro CRASH_ALIGN. Suggested-by: Dave Young <dyoung@redhat.com> Suggested-by: Baoquan He <bhe@redhat.com> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Tested-by: John Donnelly <John.p.donnelly@oracle.com> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:50:57 +08:00
Kairui Song	47c1cc9217	arm64: remove the hard coded crashkernel address limit This conflicts with upstream's kdump high reservation support, and we already have CONFIG_ZONE_DMA32 set, so we have: ARCH_LOW_ADDRESS_LIMIT = min(offset + (1ULL << 32), memblock_end_of_DRAM()); Which limits the address below 4G, so this hard code limit is redundant. Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:50:57 +08:00
Alex Shi	6bc9581ddd	Revert "gup: document and work around "COW can break either way" issue" This reverts commit 918f50807eccd63d482ef4cf778b1d2b416770a9. the commit force COW to write model, which force COW breaking, and cause page usage increase a lot. On upstream, commit `376a34efa` ("mm/gup: refactor and de-duplicate gup_fast() code") give another way to fix fork secuirty issue of COW, and then revert the buggy commit by commit `a308c71bf1` ("mm/gup: Remove enfornced COW mechanism") Signed-off-by: Alex Shi <alexsshi@tencent.com>	2024-06-11 20:50:57 +08:00
Peter Xu	2453865ed4	mm/ksm: Remove reuse_ksm_page() Remove the function as the last reference has gone away with the do_wp_page() changes. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `1a0cf26323`) Signed-off-by: Alex Shi <alexsshi@tencent.com>	2024-06-11 20:50:56 +08:00
Linus Torvalds	0fb4d8fd75	mm: do_wp_page() simplification commit `09854ba94c` upstrem How about we just make sure we're the only possible valid user fo the page before we bother to reuse it? Simplify, simplify, simplify. And get rid of the nasty serialization on the page lock at the same time. [peterx: add subject prefix] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `09854ba94c`) Signed-off-by: Alex Shi <alexsshi@tencent.com> Conflicts: mm/memory.c	2024-06-11 20:50:56 +08:00
Yuehong Wu	85ba10e6ef	config: enable BFQ io scheduler Enable CONFIG_IOSCHED_BFQ,CONFIG_BFQ_GROUP_IOSCHED for ARM to support bfq io-scheduler. Signed-off-by: Yuehong Wu <yuehongwu@tencent.com> Signed-off-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:50:53 +08:00
Ni Xun	3c048b0f89	config: change CONFIG_CONFIGFS_FS to Y for default conf CONFIG_CONFIGFS_FS from M to Y for arm default config Signed-off-by: Ni Xun <richardni@tencent.com>	2024-06-11 20:50:40 +08:00
KP Singh	5e0977fd08	security: Fix hook iteration for secid_to_secctx [upstream commit `0550cfe8c2`] secid_to_secctx is not stackable, and since the BPF LSM registers this hook by default, the call_int_hook logic is not suitable which "bails-on-fail" and casues issues when other LSMs register this hook and eventually breaks Audit. In order to fix this, directly iterate over the security hooks instead of using call_int_hook as suggested in: https: //lore.kernel.org/bpf/9d0eb6c6-803a-ff3a-5603-9ad6d9edfc00@schaufler-ca.com/#t Fixes: `98e828a065` ("security: Refactor declaration of LSM hooks") Fixes: `625236ba38` ("security: Fix the default value of secid_to_secctx hook") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: James Morris <jamorris@linux.microsoft.com> Link: https://lore.kernel.org/bpf/20200520125616.193765-1-kpsingh@chromium.org Signed-off-by: Menglong Dong <imagedong@tencent.com>	2024-06-11 20:50:15 +08:00
soonflywang	f1f1da34d4	arm64: fix NEON/VFP reentrant in fast_copy_page Add fixup in fast_copy_page, this feature is disabled by default, set vm.fast_copy_page_enabled to enable it. Signed-off-by: soonflywang <soonflywang@tencent.com> Signed-off-by: caelli <caelli@tencent.com> Reviewed-by: robinlai <robinlai@tencent.com>	2024-06-11 20:50:14 +08:00
soonflywang	ccb2a062d5	arm64: implemented a fast copy_page version while NEON/VFP is met When running on Arm server, usually there is NEON/VFP extension on Arm server CPU, this patch levearges SIMD instructions to speed up the efficiency of current copy_page(). Signed-off-by: soonflywang <soonflywang@tencent.com> Signed-off-by: Chengdong Li <chengdongli@tencent.com> Reviewed-by: robinlai <robinlai@tencent.com>	2024-06-11 20:50:14 +08:00
Anders Roxell	c7ff1ae2e7	security: Fix the default value of secid_to_secctx hook Upstream commit `625236ba38` security_secid_to_secctx is called by the bpf_lsm hook and a successful return value (i.e 0) implies that the parameter will be consumed by the LSM framework. The current behaviour return success when the pointer isn't initialized when CONFIG_BPF_LSM is enabled, with the default return from kernel/bpf/bpf_lsm.c. This is the internal error: [ 1229.341488][ T2659] usercopy: Kernel memory exposure attempt detected from null address (offset 0, size 280)! [ 1229.374977][ T2659] ------------[ cut here ]------------ [ 1229.376813][ T2659] kernel BUG at mm/usercopy.c:99! [ 1229.378398][ T2659] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [ 1229.380348][ T2659] Modules linked in: [ 1229.381654][ T2659] CPU: 0 PID: 2659 Comm: systemd-journal Tainted: G B W 5.7.0-rc5-next-20200511-00019-g864e0c6319b8-dirty #13 [ 1229.385429][ T2659] Hardware name: linux,dummy-virt (DT) [ 1229.387143][ T2659] pstate: 80400005 (Nzcv daif +PAN -UAO BTYPE=--) [ 1229.389165][ T2659] pc : usercopy_abort+0xc8/0xcc [ 1229.390705][ T2659] lr : usercopy_abort+0xc8/0xcc [ 1229.392225][ T2659] sp : ffff000064247450 [ 1229.393533][ T2659] x29: ffff000064247460 x28: 0000000000000000 [ 1229.395449][ T2659] x27: 0000000000000118 x26: 0000000000000000 [ 1229.397384][ T2659] x25: ffffa000127049e0 x24: ffffa000127049e0 [ 1229.399306][ T2659] x23: ffffa000127048e0 x22: ffffa000127048a0 [ 1229.401241][ T2659] x21: ffffa00012704b80 x20: ffffa000127049e0 [ 1229.403163][ T2659] x19: ffffa00012704820 x18: 0000000000000000 [ 1229.405094][ T2659] x17: 0000000000000000 x16: 0000000000000000 [ 1229.407008][ T2659] x15: 0000000000000000 x14: 003d090000000000 [ 1229.408942][ T2659] x13: ffff80000d5b25b2 x12: 1fffe0000d5b25b1 [ 1229.410859][ T2659] x11: 1fffe0000d5b25b1 x10: ffff80000d5b25b1 [ 1229.412791][ T2659] x9 : ffffa0001034bee0 x8 : ffff00006ad92d8f [ 1229.414707][ T2659] x7 : 0000000000000000 x6 : ffffa00015eacb20 [ 1229.416642][ T2659] x5 : ffff0000693c8040 x4 : 0000000000000000 [ 1229.418558][ T2659] x3 : ffffa0001034befc x2 : d57a7483a01c6300 [ 1229.420610][ T2659] x1 : 0000000000000000 x0 : 0000000000000059 [ 1229.422526][ T2659] Call trace: [ 1229.423631][ T2659] usercopy_abort+0xc8/0xcc [ 1229.425091][ T2659] __check_object_size+0xdc/0x7d4 [ 1229.426729][ T2659] put_cmsg+0xa30/0xa90 [ 1229.428132][ T2659] unix_dgram_recvmsg+0x80c/0x930 [ 1229.429731][ T2659] sock_recvmsg+0x9c/0xc0 [ 1229.431123][ T2659] ____sys_recvmsg+0x1cc/0x5f8 [ 1229.432663][ T2659] ___sys_recvmsg+0x100/0x160 [ 1229.434151][ T2659] __sys_recvmsg+0x110/0x1a8 [ 1229.435623][ T2659] __arm64_sys_recvmsg+0x58/0x70 [ 1229.437218][ T2659] el0_svc_common.constprop.1+0x29c/0x340 [ 1229.438994][ T2659] do_el0_svc+0xe8/0x108 [ 1229.440587][ T2659] el0_svc+0x74/0x88 [ 1229.441917][ T2659] el0_sync_handler+0xe4/0x8b4 [ 1229.443464][ T2659] el0_sync+0x17c/0x180 [ 1229.444920][ T2659] Code: aa1703e2 aa1603e1 910a8260 97ecc860 (d4210000) [ 1229.447070][ T2659] ---[ end trace 400497d91baeaf51 ]--- [ 1229.448791][ T2659] Kernel panic - not syncing: Fatal exception [ 1229.450692][ T2659] Kernel Offset: disabled [ 1229.452061][ T2659] CPU features: 0x240002,20002004 [ 1229.453647][ T2659] Memory Limit: none [ 1229.455015][ T2659] ---[ end Kernel panic - not syncing: Fatal exception ]--- Rework the so the default return value is -EOPNOTSUPP. There are likely other callbacks such as security_inode_getsecctx() that may have the same problem, and that someone that understand the code better needs to audit them. Thank you Arnd for helping me figure out what went wrong. Fixes: `98e828a065` ("security: Refactor declaration of LSM hooks") Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Chun Liu <kaicliu@tencent.com> Cc: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/bpf/20200512174607.9630-1-anders.roxell@linaro.org	2024-06-11 20:49:57 +08:00
Xinghui Li	5c899e5403	firmware: fix one UAF issue There could be the use after free issue in dmi_sysfs_register_handle. During handling specializations process, the entry->child could be free if the error occurs. However, it will be kobject_put after free. So, we set the entry->child to NULL to avoid above case. Reported-by: loydlv <loydlv@tencent.com> Signed-off-by: Xinghui Li <korantli@tencent.com>	2024-06-11 20:49:56 +08:00
Xinghui Li	84daaf3511	media:cec:fix double free and uaf issue when cancel data during noblocking data could be free when it is not completed during transmit if the opt is nonblocking.In this case,the regular free could lead to double-free.So, add the return value '-EPERM' to mark the above case. Reported-by: loydlv <loydlv@tencent.com> Signed-off-by: Xinghui Li <korantli@tencent.com> Reviewed-by: Alex Shi <alexsshi@tencent.com>	2024-06-11 20:49:56 +08:00
Lv Yunlong	ca589c18a1	gpu/xen: Fix a use after free in xen_drm_drv_init commit `52762efa2b` upstream. In function displback_changed, has the call chain displback_connect(front_info)->xen_drm_drv_init(front_info). We can see that drm_info is assigned to front_info->drm_info and drm_info is freed in fail branch in xen_drm_drv_init(). Later displback_disconnect(front_info) is called and it calls xen_drm_drv_fini(front_info) cause a use after free by drm_info = front_info->drm_info statement. My patch has done two things. First fixes the fail label which drm_info = kzalloc() failed and still free the drm_info. Second sets front_info->drm_info to NULL to avoid uaf. Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210323014656.10068-1-lyl2019@mail.ustc.edu.cn Signed-off-by: Xinghui Li <korantli@tencent.com> Reviewed-by: Robinlai <robinlai@tencent.com>	2024-06-11 20:49:56 +08:00
Vinicius Costa Gomes	627ec74d6a	igc: Fix use-after-free error during reset upstream commit: `56ea7ed103` Cleans the next descriptor to watch (next_to_watch) when cleaning the TX ring. Failure to do so can cause invalid memory accesses. If igc_poll() runs while the controller is being reset this can lead to the driver try to free a skb that was already freed. Log message: [ 101.525242] refcount_t: underflow; use-after-free. [ 101.525251] WARNING: CPU: 1 PID: 646 at lib/refcount.c:28 refcount_warn_saturate+0xab/0xf0 [ 101.525259] Modules linked in: sch_etf(E) sch_mqprio(E) rfkill(E) intel_rapl_msr(E) intel_rapl_common(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) binfmt_misc(E) kvm_intel(E) kvm(E) irqbypass(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) mei_wdt(E) libaes(E) crypto_simd(E) cryptd(E) glue_helper(E) snd_hda_codec_hdmi(E) rapl(E) intel_cstate(E) snd_hda_intel(E) snd_intel_dspcfg(E) sg(E) soundwire_intel(E) intel_uncore(E) at24(E) soundwire_generic_allocation(E) iTCO_wdt(E) soundwire_cadence(E) intel_pmc_bxt(E) serio_raw(E) snd_hda_codec(E) iTCO_vendor_support(E) watchdog(E) snd_hda_core(E) snd_hwdep(E) snd_soc_core(E) snd_compress(E) snd_pcsp(E) soundwire_bus(E) snd_pcm(E) evdev(E) snd_timer(E) mei_me(E) snd(E) soundcore(E) mei(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) crc_t10dif(E) crct10dif_generic(E) i915(E) ahci(E) libahci(E) ehci_pci(E) igb(E) xhci_pci(E) ehci_hcd(E) [ 101.525303] drm_kms_helper(E) dca(E) xhci_hcd(E) libata(E) crct10dif_pclmul(E) cec(E) crct10dif_common(E) tsn(E) igc(E) e1000e(E) ptp(E) i2c_i801(E) crc32c_intel(E) psmouse(E) i2c_algo_bit(E) i2c_smbus(E) scsi_mod(E) lpc_ich(E) pps_core(E) usbcore(E) drm(E) button(E) video(E) [ 101.525318] CPU: 1 PID: 646 Comm: irq/37-enp7s0-T Tainted: G E 5.10.30-rt37-tsn1-rt-ipipe #ipipe [ 101.525320] Hardware name: SIEMENS AG SIMATIC IPC427D/A5E31233588, BIOS V17.02.09 03/31/2017 [ 101.525322] RIP: 0010:refcount_warn_saturate+0xab/0xf0 [ 101.525325] Code: 05 31 48 44 01 01 e8 f0 c6 42 00 0f 0b c3 80 3d 1f 48 44 01 00 75 90 48 c7 c7 78 a8 f3 a6 c6 05 0f 48 44 01 01 e8 d1 c6 42 00 <0f> 0b c3 80 3d fe 47 44 01 00 0f 85 6d ff ff ff 48 c7 c7 d0 a8 f3 [ 101.525327] RSP: 0018:ffffbdedc0917cb8 EFLAGS: 00010286 [ 101.525329] RAX: 0000000000000000 RBX: ffff98fd6becbf40 RCX: 0000000000000001 [ 101.525330] RDX: 0000000000000001 RSI: ffffffffa6f2700c RDI: 00000000ffffffff [ 101.525332] RBP: ffff98fd6becc14c R08: ffffffffa7463d00 R09: ffffbdedc0917c50 [ 101.525333] R10: ffffffffa74c3578 R11: 0000000000000034 R12: 00000000ffffff00 [ 101.525335] R13: ffff98fd6b0b1000 R14: 0000000000000039 R15: ffff98fd6be35c40 [ 101.525337] FS: 0000000000000000(0000) GS:ffff98fd6e240000(0000) knlGS:0000000000000000 [ 101.525339] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 101.525341] CR2: 00007f34135a3a70 CR3: 0000000150210003 CR4: 00000000001706e0 [ 101.525343] Call Trace: [ 101.525346] sock_wfree+0x9c/0xa0 [ 101.525353] unix_destruct_scm+0x7b/0xa0 [ 101.525358] skb_release_head_state+0x40/0x90 [ 101.525362] skb_release_all+0xe/0x30 [ 101.525364] napi_consume_skb+0x57/0x160 [ 101.525367] igc_poll+0xb7/0xc80 [igc] [ 101.525376] ? sched_clock+0x5/0x10 [ 101.525381] ? sched_clock_cpu+0xe/0x100 [ 101.525385] net_rx_action+0x14c/0x410 [ 101.525388] __do_softirq+0xe9/0x2f4 [ 101.525391] __local_bh_enable_ip+0xe3/0x110 [ 101.525395] ? irq_finalize_oneshot.part.47+0xe0/0xe0 [ 101.525398] irq_forced_thread_fn+0x6a/0x80 [ 101.525401] irq_thread+0xe8/0x180 [ 101.525403] ? wake_threads_waitq+0x30/0x30 [ 101.525406] ? irq_thread_check_affinity+0xd0/0xd0 [ 101.525408] kthread+0x183/0x1a0 [ 101.525412] ? kthread_park+0x80/0x80 [ 101.525415] ret_from_fork+0x22/0x30 Fixes: `13b5b7fd6a` ("igc: Add support for Tx/Rx rings") Reported-by: Erez Geva <erez.geva.ext@siemens.com> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: jackjunliu <jackjunliu@tencent.com>	2024-06-11 20:49:55 +08:00
Sasha Neftin	b22ddb4543	igc: Remove _I_PHY_ID checking upstream commit: `7c496de538` i225 devices have only one PHY vendor. There is no point checking _I_PHY_ID during the link establishment and auto-negotiation process. This patch comes to clean up these pointless checkings. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: jackjunliu <jackjunliu@tencent.com>	2024-06-11 20:49:55 +08:00
Ni Xun	8782b3a1b0	config: change CONFIG_CONFIGFS_FS to Y for AARCH64 with CONFIG_CONFIGFS_FS=m, the default cpu cgroup has user.slice, which will slowdown unixbench Pipe-based Context Switching score: 39660 -> 19626 Signed-off-by: Ni Xun <richardni@tencent.com>	2024-06-11 20:49:55 +08:00
johnnyaiai	45a150cd8d	ARM64/conf: Disable CONFIG_RODATA_FULL_DEFAULT_ENABLED [tapd] ID877978657 This configuration resulted in a 15% regression on unixbench's execl testing. This additional enhancement can be turned on with rodata=full after this patch. Signed-off-by: johnnyaiai <johnnyaiai@tencent.com> Reviewed-by: robinlai <robinlai@tencent.com>	2024-06-11 20:49:52 +08:00
Hangyu Hua	3faa1c1ccc	xfrm: xfrm_policy: fix a possible double xfrm_pols_put() in xfrm_bundle_lookup() upstream commit: `f85daf0e72` xfrm_policy_lookup() will call xfrm_pol_hold_rcu() to get a refcount of pols[0]. This refcount can be dropped in xfrm_expand_policies() when xfrm_expand_policies() return error. pols[0]'s refcount is balanced in here. But xfrm_bundle_lookup() will also call xfrm_pols_put() with num_pols == 1 to drop this refcount when xfrm_expand_policies() return error. This patch also fix an illegal address access. pols[0] will save a error point when xfrm_policy_lookup fails. This lead to xfrm_pols_put to resolve an illegal address in xfrm_bundle_lookup's error path. Fix these by setting num_pols = 0 in xfrm_expand_policies()'s error path. Fixes: `80c802f307` ("xfrm: cache bundles instead of policies for outgoing flows") Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-06-11 20:49:23 +08:00
Duoming Zhou	b96175c56f	NFC: netlink: fix sleep in atomic bug when firmware download timeout upstream commit: `4071bf121d` There are sleep in atomic bug that could cause kernel panic during firmware download process. The root cause is that nlmsg_new with GFP_KERNEL parameter is called in fw_dnld_timeout which is a timer handler. The call trace is shown below: BUG: sleeping function called from invalid context at include/linux/sched/mm.h:265 Call Trace: kmem_cache_alloc_node __alloc_skb nfc_genl_fw_download_done call_timer_fn __run_timers.part.0 run_timer_softirq __do_softirq ... The nlmsg_new with GFP_KERNEL parameter may sleep during memory allocation process, and the timer handler is run as the result of a "software interrupt" that should not call any other function that could sleep. This patch changes allocation mode of netlink message from GFP_KERNEL to GFP_ATOMIC in order to prevent sleep in atomic bug. The GFP_ATOMIC flag makes memory allocation operation could be used in atomic context. Fixes: `9674da8759` ("NFC: Add firmware upload netlink command") Fixes: `9ea7187c53` ("NFC: netlink: Rename CMD_FW_UPLOAD to CMD_FW_DOWNLOAD") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220504055847.38026-1-duoming@zju.edu.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:49:23 +08:00
Duoming Zhou	154c293fc8	net: rose: fix UAF bugs caused by timer handler upstream commit: `9cc02ede69` There are UAF bugs in rose_heartbeat_expiry(), rose_timer_expiry() and rose_idletimer_expiry(). The root cause is that del_timer() could not stop the timer handler that is running and the refcount of sock is not managed properly. One of the UAF bugs is shown below: (thread 1) \| (thread 2) \| rose_bind \| rose_connect \| rose_start_heartbeat rose_release \| (wait a time) case ROSE_STATE_0 \| rose_destroy_socket \| rose_heartbeat_expiry rose_stop_heartbeat \| sock_put(sk) \| ... sock_put(sk) // FREE \| \| bh_lock_sock(sk) // USE The sock is deallocated by sock_put() in rose_release() and then used by bh_lock_sock() in rose_heartbeat_expiry(). Although rose_destroy_socket() calls rose_stop_heartbeat(), it could not stop the timer that is running. The KASAN report triggered by POC is shown below: BUG: KASAN: use-after-free in _raw_spin_lock+0x5a/0x110 Write of size 4 at addr ffff88800ae59098 by task swapper/3/0 ... Call Trace: <IRQ> dump_stack_lvl+0xbf/0xee print_address_description+0x7b/0x440 print_report+0x101/0x230 ? irq_work_single+0xbb/0x140 ? _raw_spin_lock+0x5a/0x110 kasan_report+0xed/0x120 ? _raw_spin_lock+0x5a/0x110 kasan_check_range+0x2bd/0x2e0 _raw_spin_lock+0x5a/0x110 rose_heartbeat_expiry+0x39/0x370 ? rose_start_heartbeat+0xb0/0xb0 call_timer_fn+0x2d/0x1c0 ? rose_start_heartbeat+0xb0/0xb0 expire_timers+0x1f3/0x320 __run_timers+0x3ff/0x4d0 run_timer_softirq+0x41/0x80 __do_softirq+0x233/0x544 irq_exit_rcu+0x41/0xa0 sysvec_apic_timer_interrupt+0x8c/0xb0 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1b/0x20 RIP: 0010:default_idle+0xb/0x10 RSP: 0018:ffffc9000012fea0 EFLAGS: 00000202 RAX: 000000000000bcae RBX: ffff888006660f00 RCX: 000000000000bcae RDX: 0000000000000001 RSI: ffffffff843a11c0 RDI: ffffffff843a1180 RBP: dffffc0000000000 R08: dffffc0000000000 R09: ffffed100da36d46 R10: dfffe9100da36d47 R11: ffffffff83cf0950 R12: 0000000000000000 R13: 1ffff11000ccc1e0 R14: ffffffff8542af28 R15: dffffc0000000000 ... Allocated by task 146: __kasan_kmalloc+0xc4/0xf0 sk_prot_alloc+0xdd/0x1a0 sk_alloc+0x2d/0x4e0 rose_create+0x7b/0x330 __sock_create+0x2dd/0x640 __sys_socket+0xc7/0x270 __x64_sys_socket+0x71/0x80 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 152: kasan_set_track+0x4c/0x70 kasan_set_free_info+0x1f/0x40 ____kasan_slab_free+0x124/0x190 kfree+0xd3/0x270 __sk_destruct+0x314/0x460 rose_release+0x2fa/0x3b0 sock_close+0xcb/0x230 __fput+0x2d9/0x650 task_work_run+0xd6/0x160 exit_to_user_mode_loop+0xc7/0xd0 exit_to_user_mode_prepare+0x4e/0x80 syscall_exit_to_user_mode+0x20/0x40 do_syscall_64+0x4f/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 This patch adds refcount of sock when we use functions such as rose_start_heartbeat() and so on to start timer, and decreases the refcount of sock when timer is finished or deleted by functions such as rose_stop_heartbeat() and so on. As a result, the UAF bugs could be mitigated. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Tested-by: Duoming Zhou <duoming@zju.edu.cn> Link: https://lore.kernel.org/r/20220629002640.5693-1-duoming@zju.edu.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:49:22 +08:00
Florian Westphal	7912b75a66	netfilter: nf_queue: do not allow packet truncation below transport header offset upstream commit: `99a63d36cb` Domingo Dirutigliano and Nicola Guerrera report kernel panic when sending nf_queue verdict with 1-byte nfta_payload attribute. The IP/IPv6 stack pulls the IP(v6) header from the packet after the input hook. If user truncates the packet below the header size, this skb_pull() will result in a malformed skb (skb->len < 0). Fixes: `7af4cc3fa1` ("[NETFILTER]: Add "nfnetlink_queue" netfilter queue handler over nfnetlink") Reported-by: Domingo Dirutigliano <pwnzer0tt1@proton.me> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:49:22 +08:00
Yuehong Wu	1768198a0c	config: add DRM_AST hyperV,network and DRM modules Some images are being built using 0009-kabi branch and was expected to run on some virtualization environments, and might be used for desktops. So enable related drivers. More test needed though. config: enable more commonly used DRM drivers config: enable CONFIG_DRM_AST config: enable hyperv related configs config: enable CONFIG_IGC Signed-off-by: Yuehong Wu <yuehongwu@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com>	2024-06-11 20:49:15 +08:00
sumiyawang	a0fc351741	driver: update hisilicon hardware crypto engine keep the hisilicon crypto driver up to 1.3.11, autoprobe the modules when hardware enabled. Signed-off-by: sumiyawang <sumiyawang@tencent.com>	2024-06-11 20:47:35 +08:00
Jann Horn	fa3c010699	net: usb: ax88179_178a: Fix out-of-bounds accesses in RX fixup commit `57bc3d3ae8` upstream. ax88179_rx_fixup() contains several out-of-bounds accesses that can be triggered by a malicious (or defective) USB device, in particular: - The metadata array (hdr_off..hdr_off+2*pkt_cnt) can be out of bounds, causing OOB reads and (on big-endian systems) OOB endianness flips. - A packet can overlap the metadata array, causing a later OOB endianness flip to corrupt data used by a cloned SKB that has already been handed off into the network stack. - A packet SKB can be constructed whose tail is far beyond its end, causing out-of-bounds heap data to be considered part of the SKB's data. I have tested that this can be used by a malicious USB device to send a bogus ICMPv6 Echo Request and receive an ICMPv6 Echo Reply in response that contains random kernel heap data. It's probably also possible to get OOB writes from this on a little-endian system somehow - maybe by triggering skb_cow() via IP options processing -, but I haven't tested that. Fixes: `e2ca90c276` ("ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver") Cc: stable@kernel.org Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-06-11 20:44:41 +08:00
Jianping Liu	b5c25d0440	smp: fix slave core boot fail on altramax platform Try 5000ms again with irq disable every 5ms to fix slave core boot fail on altramax platform. On ampere altramax platform, it has 256 cpu cores with multi node. When CONFIG_HZ>=250, the tick will be created too frequently, which cause slave core boot fail (ampere cpu bug). It needing to disable cpu0's irq >= 5ms each time, which can reduce irq act. Signed-off-by: Jianping Liu <frankjpliu@tencent.com> Reviewed-by: Alex Shi <alexsshi@tencent.com> Reviewed-by: samuelliao <samuelliao@tencent.com>	2024-06-11 20:44:41 +08:00
Bhupesh Sharma	c2671630ad	arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo commit `bbdbc11804` upstream. TCR_EL1.TxSZ, which controls the VA space size, is configured by a single kernel image to support either 48-bit or 52-bit VA space. If the ARMv8.2-LVA optional feature is present and we are running with a 64KB page size, then it is possible to use 52-bits of address space for both userspace and kernel addresses. However, any kernel binary that supports 52-bit must also be able to fall back to 48-bit at early boot time if the hardware feature is not present. Since TCR_EL1.T1SZ indicates the size of the memory region addressed by TTBR1_EL1, export the same in vmcoreinfo. User-space utilities like makedumpfile and crash-utility need to read this value from vmcoreinfo for determining if a virtual address lies in the linear map range. While at it also add documentation for TCR_EL1.T1SZ variable being added to vmcoreinfo. It indicates the size offset of the memory region addressed by TTBR1_EL1. Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com> Tested-by: John Donnelly <john.p.donnelly@oracle.com> Tested-by: Kamlakant Patel <kamlakantp@marvell.com> Tested-by: Amit Daniel Kachhap <amit.kachhap@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Reviewed-by: Amit Daniel Kachhap <amit.kachhap@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Steve Capper <steve.capper@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Dave Anderson <anderson@redhat.com> Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: kexec@lists.infradead.org Link: https://lore.kernel.org/r/1589395957-24628-3-git-send-email-bhsharma@redhat.com [catalin.marinas@arm.com: removed vabits_actual from the commit log] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-06-11 20:44:41 +08:00
Maciej Fijalkowski	b675b19f3b	veth: Implement ethtool's get_channels() callback [upstream commit `34829eec3b`] Libbpf's xsk part calls get_channels() API to retrieve the queue count of the underlying driver so that XSKMAP is sized accordingly. Implement that in veth so multi queue scenarios can work properly. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210329224316.17793-14-maciej.fijalkowski@intel.com Signed-off-by: Menglong Dong <imagedong@tencent.com>	2024-06-11 20:44:40 +08:00
Xin Long	d812f3ef2e	xfrm: add prep for esp beet mode offload Like __xfrm_transport/mode_tunnel_prep(), this patch is to add __xfrm_mode_beet_prep() to fix the transport_header for gso segments, and reset skb mac_len, and pull skb data to the proto inside esp. This patch also fixes a panic, reported by ltp: # modprobe esp4_offload # runltp -f net_stress.ipsec_tcp [ 2452.780511] kernel BUG at net/core/skbuff.c:109! [ 2452.799851] Call Trace: [ 2452.800298] <IRQ> [ 2452.800705] skb_push.cold.98+0x14/0x20 [ 2452.801396] esp_xmit+0x17b/0x270 [esp4_offload] [ 2452.802799] validate_xmit_xfrm+0x22f/0x2e0 [ 2452.804285] __dev_queue_xmit+0x589/0x910 [ 2452.806264] __neigh_update+0x3d7/0xa50 [ 2452.806958] arp_process+0x259/0x810 [ 2452.807589] arp_rcv+0x18a/0x1c It was caused by the skb going to esp_xmit with a wrong transport header. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-06-11 20:44:40 +08:00
Alexander Mikhalitsyn	0ba944e1bd	shm: extend forced shm destroy to support objects from several IPC nses commit `85b6d24646` upstream. Currently, the exit_shm() function not designed to work properly when task->sysvshm.shm_clist holds shm objects from different IPC namespaces. This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it leads to use-after-free (reproducer exists). This is an attempt to fix the problem by extending exit_shm mechanism to handle shm's destroy from several IPC ns'es. To achieve that we do several things: 1. add a namespace (non-refcounted) pointer to the struct shmid_kernel 2. during new shm object creation (newseg()/shmget syscall) we initialize this pointer by current task IPC ns 3. exit_shm() fully reworked such that it traverses over all shp's in task->sysvshm.shm_clist and gets IPC namespace not from current task as it was before but from shp's object itself, then call shm_destroy(shp, ns). Note: We need to be really careful here, because as it was said before (1), our pointer to IPC ns non-refcnt'ed. To be on the safe side we using special helper get_ipc_ns_not_zero() which allows to get IPC ns refcounter only if IPC ns not in the "state of destruction". Q/A Q: Why can we access shp->ns memory using non-refcounted pointer? A: Because shp object lifetime is always shorther than IPC namespace lifetime, so, if we get shp object from the task->sysvshm.shm_clist while holding task_lock(task) nobody can steal our namespace. Q: Does this patch change semantics of unshare/setns/clone syscalls? A: No. It's just fixes non-covered case when process may leave IPC namespace without getting task->sysvshm.shm_clist list cleaned up. Link: https://lkml.kernel.org/r/67bb03e5-f79c-1815-e2bf-949c67047418@colorfullife.com Link: https://lkml.kernel.org/r/20211109151501.4921-1-manfred@colorfullife.com Fixes: `ab602f7991` ("shm: make exit_shm work proportional to task activity") Co-developed-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Andrei Vagin <avagin@gmail.com> Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Cc: Vasily Averin <vvs@virtuozzo.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jianping Liu <frankjpliu@tencent.com> Reviewed-by: Alex Shi <alexsshi@tencent.com>	2024-06-11 20:44:39 +08:00
Haisu Wang	cee4b3596d	block: fix the incorrect spin_lock_irq to spin_lock The process already run in irq disabled state. Should use spin_lock instead of spin_lock_irq, otherwise spin_unlock_irq may enable the irq in wrong stage. Call Trace: _raw_spin_lock_irq+0x20/0x24 blkcg_print_blkgs+0x4f/0xe0 blkg_print_stat_bytes+0x44/0x50 cgroup_seqfile_show+0x4c/0xb0 kernfs_seq_show+0x21/0x30 seq_read+0x14c/0x3f0 kernfs_fop_read+0x35/0x190 __vfs_read+0x18/0x40 vfs_read+0x99/0x160 ksys_read+0x61/0xe0 __x64_sys_read+0x1a/0x20 do_syscall_64+0x47/0x140 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: f2519e1ed9a16 ("blkcg: add per blkcg diskstats") Signed-off-by: Haisu Wang <haisuwang@tencent.com> Reviewed-by:: Honglin Li <honglinli@tencent.com>	2024-06-11 20:44:39 +08:00
shookliu	904baaf92b	md/raid10: avoid deadlock on recovery. When disk failure happens and the array has a spare drive, resync thread kicks in and starts to refill the spare. However it may get blocked by a retry thread that resubmits failed IO to a mirror and itself can get blocked on a barrier raised by the resync thread. upstream commit id:fe630de009d0729584d79c78f43121e07c745fdc Acked-by: Nigel Croxon <ncroxon@redhat.com> Signed-off-by: Vitaly Mayatskikh <vmayatskikh@digitalocean.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: shookliu <shookliu@tencent.com>	2024-06-11 20:44:39 +08:00
Kairui Song	d1c25caef9	x86/mpparse, kexec: switch apic driver early if x2apic is pre-enabled Following kernel panic is observed when doing kexec/kdump on machines that use mptable, and supports x2apic: [ 0.010090] Intel MultiProcessor Specification v1.4 [ 0.010688] MPTABLE: OEM ID: BOCHSCPU [ 0.010886] MPTABLE: Product ID: 0.1 [ 0.011119] MPTABLE: APIC at: 0xFEE00000 [ 0.011332] BUG: unable to handle page fault for address: ffffffffff5fc020 [ 0.011702] #PF: supervisor read access in kernel mode [ 0.011981] #PF: error_code(0x0000) - not-present page [ 0.012256] PGD 25e15067 P4D 25e15067 PUD 25e17067 PMD 25e18067 PTE 0 [ 0.012603] Oops: 0000 [#1] SMP NOPTI [ 0.012801] CPU: 0 PID: 0 Comm: swapper Not tainted 5.14.10-300.fc35.x86_64 #1 [ 0.013189] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014 [ 0.013658] RIP: 0010:native_apic_mem_read+0x2/0x10 [ 0.013924] Code: 14 25 20 cd e3 82 c3 90 bf 30 08 00 00 ff 14 25 18 cd e3 82 c3 cc cc cc 89 ff 89 b7 00 c0 5f ff c3 0f 1f 80 00 00 00 00 89 ff <8b> 87 00 c0 5f ff c3 0f 1f 80 00 00 00 0 [ 0.014930] RSP: 0000:ffffffff82e03e18 EFLAGS: 00010046 [ 0.015211] RAX: ffffffff81064840 RBX: ffffffffff240b6c RCX: ffffffff82f17428 [ 0.015593] RDX: c0000000ffffdfff RSI: 00000000ffffdfff RDI: 0000000000000020 [ 0.015977] RBP: ffff888023200000 R08: 0000000000000000 R09: ffffffff82e03c50 [ 0.016385] R10: ffffffff82e03c48 R11: ffffffff82f47468 R12: ffffffffff240b40 [ 0.016768] R13: ffffffffff200b30 R14: 0000000000000000 R15: 00000000000000d4 [ 0.017155] FS: 0000000000000000(0000) GS:ffffffff8365b000(0000) knlGS:0000000000000000 [ 0.017589] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.017899] CR2: ffffffffff5fc020 CR3: 0000000025e10000 CR4: 00000000000006b0 [ 0.018284] Call Trace: [ 0.018417] ? read_apic_id+0x15/0x30 [ 0.018616] ? register_lapic_address+0x76/0x97 [ 0.018864] ? default_get_smp_config+0x28b/0x42d [ 0.019119] ? dmi_check_system+0x1c/0x60 [ 0.019337] ? acpi_boot_init+0x1d/0x4c3 [ 0.019550] ? setup_arch+0xb37/0xc2a [ 0.019749] ? slab_is_available+0x5/0x10 [ 0.019969] ? start_kernel+0x61/0x980 [ 0.020173] ? load_ucode_bsp+0x4c/0xcd [ 0.020380] ? secondary_startup_64_no_verify+0xc2/0xcb [ 0.020664] Modules linked in: [ 0.020830] CR2: ffffffffff5fc020 [ 0.021012] random: get_random_bytes called from oops_exit+0x35/0x60 with crng_init=0 [ 0.021015] ---[ end trace c9e569df3bdbefd3 ]--- Checking following init order we have: setup_arch() check_x2apic() <-- x2apic is enabled by first kernel before kexec, this set x2apic_mode = 1, make sure later probes will recognize pre-enabled x2apic. .... acpi_boot_init(); <-- With ACPI MADT, this will switch apic driver to x2apic, but it will do nothing with mptable. x86_dtb_init(); get_smp_config(); default_get_smp_config(); check_physptr(); smp_read_mpc(); register_lapic_address(); <-- panic here init_apic_mappings(); .... The problem here is mpparse need to read some boot info from apic, so calls register_lapic_address() early. But without MADT, apic driver is still apic_flat, it attempts to use the MMIO interface which is never mapped since: commit `0450193bff` ("x86, x2apic: Don't map lapic addr for preenabled x2apic systems") Simply map it won't work either as in x2apic mode the MMIO interface is not really available (Intel SDM Volume 3A 10.12.2), later code will fail with other errors. So here we do the apic driver probe early. With pre-enabled x2apic, the probe will recognize it and switch to the right driver just fine. Such issue is currently only seen with kexec/kdump, which enabled the x2apic in first kernel and kept it enabled to 2nd kernel. This can be easily reproduced with qemu, use -no-acpi and enable x2apic. Signed-off-by: Kairui Song <kasong@tencent.com>	2024-06-11 20:44:38 +08:00
Liuchun	48bc119a5a	cpuhotplug: reject core0 offline by default Core 0 of some server models with ARM architecture cannot be taken offline, so it is rejected by default. Signed-off-by: Chun Liu <kaicliu@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:44:38 +08:00
mayercheng	8769f0840e	driver: update megaraid_sas to 07.721.02.00 Signed-off-by: mayercheng <mayercheng@tencent.com>	2024-06-11 20:44:38 +08:00
Xin Long	f34f3b4e56	sctp: use call_rcu to free endpoint [ Upstream commit `5ec7d18d18` ] This patch is to delay the endpoint free by calling call_rcu() to fix another use-after-free issue in sctp_sock_dump(): BUG: KASAN: use-after-free in __lock_acquire+0x36d9/0x4c20 Call Trace: __lock_acquire+0x36d9/0x4c20 kernel/locking/lockdep.c:3218 lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline] _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168 spin_lock_bh include/linux/spinlock.h:334 [inline] __lock_sock+0x203/0x350 net/core/sock.c:2253 lock_sock_nested+0xfe/0x120 net/core/sock.c:2774 lock_sock include/net/sock.h:1492 [inline] sctp_sock_dump+0x122/0xb20 net/sctp/diag.c:324 sctp_for_each_transport+0x2b5/0x370 net/sctp/socket.c:5091 sctp_diag_dump+0x3ac/0x660 net/sctp/diag.c:527 __inet_diag_dump+0xa8/0x140 net/ipv4/inet_diag.c:1049 inet_diag_dump+0x9b/0x110 net/ipv4/inet_diag.c:1065 netlink_dump+0x606/0x1080 net/netlink/af_netlink.c:2244 __netlink_dump_start+0x59a/0x7c0 net/netlink/af_netlink.c:2352 netlink_dump_start include/linux/netlink.h:216 [inline] inet_diag_handler_cmd+0x2ce/0x3f0 net/ipv4/inet_diag.c:1170 __sock_diag_cmd net/core/sock_diag.c:232 [inline] sock_diag_rcv_msg+0x31d/0x410 net/core/sock_diag.c:263 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2477 sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:274 This issue occurs when asoc is peeled off and the old sk is freed after getting it by asoc->base.sk and before calling lock_sock(sk). To prevent the sk free, as a holder of the sk, ep should be alive when calling lock_sock(). This patch uses call_rcu() and moves sock_put and ep free into sctp_endpoint_destroy_rcu(), so that it's safe to try to hold the ep under rcu_read_lock in sctp_transport_traverse_process(). If sctp_endpoint_hold() returns true, it means this ep is still alive and we have held it and can continue to dump it; If it returns false, it means this ep is dead and can be freed after rcu_read_unlock, and we should skip it. In sctp_sock_dump(), after locking the sk, if this ep is different from tsp->asoc->ep, it means during this dumping, this asoc was peeled off before calling lock_sock(), and the sk should be skipped; If this ep is the same with tsp->asoc->ep, it means no peeloff happens on this asoc, and due to lock_sock, no peeloff will happen either until release_sock. Note that delaying endpoint free won't delay the port release, as the port release happens in sctp_endpoint_destroy() before calling call_rcu(). Also, freeing endpoint by call_rcu() makes it safe to access the sk by asoc->base.sk in sctp_assocs_seq_show() and sctp_rcv(). Thanks Jones to bring this issue up. v1->v2: - improve the changelog. - add kfree(ep) into sctp_endpoint_destroy_rcu(), as Jakub noticed. Reported-by: syzbot+9276d76e83e3bcde6c99@syzkaller.appspotmail.com Reported-by: Lee Jones <lee.jones@linaro.org> Fixes: `d25adbeb0c` ("sctp: fix an use-after-free issue in sctp_sock_dump") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Fuhai Wang <fuhaiwang@tencent.com>	2024-06-11 20:44:37 +08:00
Hangyu Hua	014df595cb	phonet: refcount leak in pep_sock_accep commit `bcd0f93353` upstream. sock_hold(sk) is invoked in pep_sock_accept(), but __sock_put(sk) is not invoked in subsequent failure branches(pep_accept_conn() != 0). Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Link: https://lore.kernel.org/r/20211209082839.33985-1-hbh25y@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Aayush Agarwal <aayush.a.agarwal@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Fuhai Wang <fuhaiwang@tencent.com>	2024-06-11 20:44:37 +08:00
Pablo Neira Ayuso	f4f8bcc4f1	netfilter: nf_tables: disallow non-stateful expression in sets earlier commit `520778042c` upstream. Since `3e135cd499` ("netfilter: nft_dynset: dynamic stateful expression instantiation"), it is possible to attach stateful expressions to set elements. `cd5125d8f5` ("netfilter: nf_tables: split set destruction in deactivate and destroy phase") introduces conditional destruction on the object to accomodate transaction semantics. nft_expr_init() calls expr->ops->init() first, then check for NFT_STATEFUL_EXPR, this stills allows to initialize a non-stateful lookup expressions which points to a set, which might lead to UAF since the set is not properly detached from the set->binding for this case. Anyway, this combination is non-sense from nf_tables perspective. This patch fixes this problem by checking for NFT_STATEFUL_EXPR before expr->ops->init() is called. The reporter provides a KASAN splat and a poc reproducer (similar to those autogenerated by syzbot to report use-after-free errors). It is unknown to me if they are using syzbot or if they use similar automated tool to locate the bug that they are reporting. For the record, this is the KASAN splat. [ 85.431824] ================================================================== [ 85.432901] BUG: KASAN: use-after-free in nf_tables_bind_set+0x81b/0xa20 [ 85.433825] Write of size 8 at addr ffff8880286f0e98 by task poc/776 [ 85.434756] [ 85.434999] CPU: 1 PID: 776 Comm: poc Tainted: G W 5.18.0+ #2 [ 85.436023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Fixes: `0b2d8a7b63` ("netfilter: nf_tables: add helper functions for expression handling") Reported-and-tested-by: Aaron Adams <edg-e@nccgroup.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> [Ajay: Regenerated the patch for v5.4.y] Signed-off-by: Ajay Kaher <akaher@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Fuhai Wang <fuhaiwang@tencent.com>	2024-06-11 20:44:37 +08:00

1 2 3 4 5 ...

873650 Commits All Branches Search

873650 Commits

All Branches