OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
caelli	fcad35499e	Revert "Merge branch 'pub/lts/caelli_ras' into 'pub/lts/0009-kabi' (merge request !671 )" This reverts commit `1eda0438d7`.	2024-06-11 20:43:53 +08:00
Helge Deller	e431a54055	fbcon: Disallow setting font bigger than screen size upstream commit id: `65a01e601d` Prevent that users set a font size which is bigger than the physical screen. It's unlikely this may happen (because screens are usually much larger than the fonts and each font char is limited to 32x32 pixels), but it may happen on smaller screens/LCD displays. Signed-off-by: Helge Deller <deller@gmx.de> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Anakin Zhang <anakinzhang@tencent.com> Reviewed-by: Alex Shi <alexsshi@tencent.com>	2024-06-11 20:43:52 +08:00
Jann Horn	c460cf4df6	ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE upstream commit id: `ee1fee9005` Setting PTRACE_O_SUSPEND_SECCOMP is supposed to be a highly privileged operation because it allows the tracee to completely bypass all seccomp filters on kernels with CONFIG_CHECKPOINT_RESTORE=y. It is only supposed to be settable by a process with global CAP_SYS_ADMIN, and only if that process is not subject to any seccomp filters at all. However, while these permission checks were done on the PTRACE_SETOPTIONS path, they were missing on the PTRACE_SEIZE path, which also sets user-specified ptrace flags. Move the permissions checks out into a helper function and let both ptrace_attach() and ptrace_setoptions() call it. Cc: stable@kernel.org Fixes: `13c4a90119` ("seccomp: add ptrace options for suspend/resume") Signed-off-by: Jann Horn <jannh@google.com> Link: https://lkml.kernel.org/r/20220319010838.1386861-1-jannh@google.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Anakin Zhang <anakinzhang@tencent.com> Reviewed-by: Alex Shi <alexsshi@tencent.com>	2024-06-11 20:43:52 +08:00
Szymon Heidrich	7f48f76582	USB: gadget: validate endpoint index for xilinx udc upstream commit id: `7f14c7227f` Assure that host may not manipulate the index to point past endpoint array. Signed-off-by: Szymon Heidrich <szymon.heidrich@gmail.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Anakin Zhang <anakinzhang@tencent.com> Reviewed-by: Alex Shi <alexsshi@tencent.com>	2024-06-11 20:43:52 +08:00
Jordy Zomer	7aa62bdcb7	nfc: st21nfca: Fix potential buffer overflows in EVT_TRANSACTION upstream commit id: `4fbcc1a4cb` It appears that there are some buffer overflows in EVT_TRANSACTION. This happens because the length parameters that are passed to memcpy come directly from skb->data and are not guarded in any way. Signed-off-by: Jordy Zomer <jordy@pwning.systems> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Anakin Zhang <anakinzhang@tencent.com> Reviewed-by: Alex Shi <alexsshi@tencent.com>	2024-06-11 20:43:51 +08:00
Lin Ma	3c58c9735f	hamradio: improve the incomplete fix to avoid NPD commit `b2f37aead1` upstream. The previous commit `3e0588c291` ("hamradio: defer ax25 kfree after unregister_netdev") reorder the kfree operations and unregister_netdev operation to prevent UAF. This commit improves the previous one by also deferring the nullify of the ax->tty pointer. Otherwise, a NULL pointer dereference bug occurs. Partial of the stack trace is shown below. BUG: kernel NULL pointer dereference, address: 0000000000000538 RIP: 0010:ax_xmit+0x1f9/0x400 ... Call Trace: dev_hard_start_xmit+0xec/0x320 sch_direct_xmit+0xea/0x240 __qdisc_run+0x166/0x5c0 __dev_queue_xmit+0x2c7/0xaf0 ax25_std_establish_data_link+0x59/0x60 ax25_connect+0x3a0/0x500 ? security_socket_connect+0x2b/0x40 __sys_connect+0x96/0xc0 ? __hrtimer_init+0xc0/0xc0 ? common_nsleep+0x2e/0x50 ? switch_fpu_return+0x139/0x1a0 __x64_sys_connect+0x11/0x20 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The crash point is shown as below static void ax_encaps(...) { ... set_bit(TTY_DO_WRITE_WAKEUP, &ax->tty->flags); // ax->tty = NULL! ... } By placing the nullify action after the unregister_netdev, the ax->tty pointer won't be assigned as NULL net_device framework layer is well synchronized. Signed-off-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-06-11 20:43:51 +08:00
Lin Ma	fd91257223	hamradio: defer ax25 kfree after unregister_netdev commit `3e0588c291` upstream. There is a possible race condition (use-after-free) like below (USE) \| (FREE) ax25_sendmsg \| ax25_queue_xmit \| dev_queue_xmit \| __dev_queue_xmit \| __dev_xmit_skb \| sch_direct_xmit \| ... xmit_one \| netdev_start_xmit \| tty_ldisc_kill __netdev_start_xmit \| mkiss_close ax_xmit \| kfree ax_encaps \| \| Even though there are two synchronization primitives before the kfree: 1. wait_for_completion(&ax->dead). This can prevent the race with routines from mkiss_ioctl. However, it cannot stop the routine coming from upper layer, i.e., the ax25_sendmsg. 2. netif_stop_queue(ax->dev). It seems that this line of code aims to halt the transmit queue but it fails to stop the routine that already being xmit. This patch reorder the kfree after the unregister_netdev to avoid the possible UAF as the unregister_netdev() is well synchronized and won't return if there is a running routine. Signed-off-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-06-11 20:43:51 +08:00
Pavel Skripkin	177d23c5f0	net: hamradio: fix memory leak in mkiss_close [ Upstream commit `7edcc68230` ] My local syzbot instance hit memory leak in mkiss_open()[1]. The problem was in missing free_netdev() in mkiss_close(). In mkiss_open() netdevice is allocated and then registered, but in mkiss_close() netdevice was only unregistered, but not freed. Fail log: BUG: memory leak unreferenced object 0xffff8880281ba000 (size 4096): comm "syz-executor.1", pid 11443, jiffies 4295046091 (age 17.660s) hex dump (first 32 bytes): 61 78 30 00 00 00 00 00 00 00 00 00 00 00 00 00 ax0............. 00 27 fa 2a 80 88 ff ff 00 00 00 00 00 00 00 00 .'.*............ backtrace: [<ffffffff81a27201>] kvmalloc_node+0x61/0xf0 [<ffffffff8706e7e8>] alloc_netdev_mqs+0x98/0xe80 [<ffffffff84e64192>] mkiss_open+0xb2/0x6f0 [1] [<ffffffff842355db>] tty_ldisc_open+0x9b/0x110 [<ffffffff84236488>] tty_set_ldisc+0x2e8/0x670 [<ffffffff8421f7f3>] tty_ioctl+0xda3/0x1440 [<ffffffff81c9f273>] __x64_sys_ioctl+0x193/0x200 [<ffffffff8911263a>] do_syscall_64+0x3a/0xb0 [<ffffffff89200068>] entry_SYSCALL_64_after_hwframe+0x44/0xae BUG: memory leak unreferenced object 0xffff8880141a9a00 (size 96): comm "syz-executor.1", pid 11443, jiffies 4295046091 (age 17.660s) hex dump (first 32 bytes): e8 a2 1b 28 80 88 ff ff e8 a2 1b 28 80 88 ff ff ...(.......(.... 98 92 9c aa b0 40 02 00 00 00 00 00 00 00 00 00 .....@.......... backtrace: [<ffffffff8709f68b>] __hw_addr_create_ex+0x5b/0x310 [<ffffffff8709fb38>] __hw_addr_add_ex+0x1f8/0x2b0 [<ffffffff870a0c7b>] dev_addr_init+0x10b/0x1f0 [<ffffffff8706e88b>] alloc_netdev_mqs+0x13b/0xe80 [<ffffffff84e64192>] mkiss_open+0xb2/0x6f0 [1] [<ffffffff842355db>] tty_ldisc_open+0x9b/0x110 [<ffffffff84236488>] tty_set_ldisc+0x2e8/0x670 [<ffffffff8421f7f3>] tty_ioctl+0xda3/0x1440 [<ffffffff81c9f273>] __x64_sys_ioctl+0x193/0x200 [<ffffffff8911263a>] do_syscall_64+0x3a/0xb0 [<ffffffff89200068>] entry_SYSCALL_64_after_hwframe+0x44/0xae BUG: memory leak unreferenced object 0xffff8880219bfc00 (size 512): comm "syz-executor.1", pid 11443, jiffies 4295046091 (age 17.660s) hex dump (first 32 bytes): 00 a0 1b 28 80 88 ff ff 80 8f b1 8d ff ff ff ff ...(............ 80 8f b1 8d ff ff ff ff 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81a27201>] kvmalloc_node+0x61/0xf0 [<ffffffff8706eec7>] alloc_netdev_mqs+0x777/0xe80 [<ffffffff84e64192>] mkiss_open+0xb2/0x6f0 [1] [<ffffffff842355db>] tty_ldisc_open+0x9b/0x110 [<ffffffff84236488>] tty_set_ldisc+0x2e8/0x670 [<ffffffff8421f7f3>] tty_ioctl+0xda3/0x1440 [<ffffffff81c9f273>] __x64_sys_ioctl+0x193/0x200 [<ffffffff8911263a>] do_syscall_64+0x3a/0xb0 [<ffffffff89200068>] entry_SYSCALL_64_after_hwframe+0x44/0xae BUG: memory leak unreferenced object 0xffff888029b2b200 (size 256): comm "syz-executor.1", pid 11443, jiffies 4295046091 (age 17.660s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81a27201>] kvmalloc_node+0x61/0xf0 [<ffffffff8706f062>] alloc_netdev_mqs+0x912/0xe80 [<ffffffff84e64192>] mkiss_open+0xb2/0x6f0 [1] [<ffffffff842355db>] tty_ldisc_open+0x9b/0x110 [<ffffffff84236488>] tty_set_ldisc+0x2e8/0x670 [<ffffffff8421f7f3>] tty_ioctl+0xda3/0x1440 [<ffffffff81c9f273>] __x64_sys_ioctl+0x193/0x200 [<ffffffff8911263a>] do_syscall_64+0x3a/0xb0 [<ffffffff89200068>] entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: `815f62bf74` ("[PATCH] SMP rewrite of mkiss") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-06-11 20:43:50 +08:00
Pavel Skripkin	b1011628cc	ath9k: fix use-after-free in ath9k_hif_usb_rx_cb [ Upstream commit `0ac4827f78` ] Syzbot reported use-after-free Read in ath9k_hif_usb_rx_cb() [0]. The problem was in incorrect htc_handle->drv_priv initialization. Probable call trace which can trigger use-after-free: ath9k_htc_probe_device() /* htc_handle->drv_priv = priv; / ath9k_htc_wait_for_target() <--- Failed ieee80211_free_hw() <--- priv pointer is freed <IRQ> ... ath9k_hif_usb_rx_cb() ath9k_hif_usb_rx_stream() RX_STAT_INC() <--- htc_handle->drv_priv access In order to not add fancy protection for drv_priv we can move htc_handle->drv_priv initialization at the end of the ath9k_htc_probe_device() and add helper macro to make all _STAT_* macros NULL safe, since syzbot has reported related NULL deref in that macros [1] Link: https://syzkaller.appspot.com/bug?id=6ead44e37afb6866ac0c7dd121b4ce07cb665f60 [0] Link: https://syzkaller.appspot.com/bug?id=b8101ffcec107c0567a0cd8acbbacec91e9ee8de [1] Fixes: `fb9987d0f7` ("ath9k_htc: Support for AR9271 chipset.") Reported-and-tested-by: syzbot+03110230a11411024147@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+c6dde1f690b60e0b9fbe@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/d57bbedc857950659bfacac0ab48790c1eda00c8.1655145743.git.paskripkin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-06-11 20:43:50 +08:00
Hangyu Hua	ebb6b333e9	can: ems_usb: ems_usb_start_xmit(): fix double dev_kfree_skb() in error path commit `c702227522` upstream. There is no need to call dev_kfree_skb() when usb_submit_urb() fails beacause can_put_echo_skb() deletes the original skb and can_free_echo_skb() deletes the cloned skb. Link: https://lore.kernel.org/all/20220228083639.38183-1-hbh25y@gmail.com Fixes: `702171adee` ("ems_usb: Added support for EMS CPC-USB/ARM7 CAN/USB interface") Cc: stable@vger.kernel.org Cc: Sebastian Haas <haas@ems-wuensche.com> Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-06-11 20:43:49 +08:00
Hangyu Hua	c32c99a019	can: mcba_usb: mcba_usb_start_xmit(): fix double dev_kfree_skb in error path commit `04c9b00ba8` upstream. There is no need to call dev_kfree_skb() when usb_submit_urb() fails because can_put_echo_skb() deletes original skb and can_free_echo_skb() deletes the cloned skb. Fixes: `51f3baad7d` ("can: mcba_usb: Add support for Microchip CAN BUS Analyzer") Link: https://lore.kernel.org/all/20220311080208.45047-1-hbh25y@gmail.com Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-06-11 20:43:49 +08:00
Hangyu Hua	0909f6d0f7	can: usb_8dev: usb_8dev_start_xmit(): fix double dev_kfree_skb() in error path commit `3d3925ff64` upstream. There is no need to call dev_kfree_skb() when usb_submit_urb() fails because can_put_echo_skb() deletes original skb and can_free_echo_skb() deletes the cloned skb. Fixes: `0024d8ad16` ("can: usb_8dev: Add support for USB2CAN interface from 8 devices") Link: https://lore.kernel.org/all/20220311080614.45229-1-hbh25y@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> [DP: adjusted params of can_free_echo_skb() for 5.4 stable] Signed-off-by: Dragos-Marian Panait <dragos.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-06-11 20:43:49 +08:00
Oliver Neukum	74fa958963	sr9700: sanity check for packet length commit `e9da0b56fe` upstream. A malicious device can leak heap data to user space providing bogus frame lengths. Introduce a sanity check. Signed-off-by: Oliver Neukum <oneukum@suse.com> Reviewed-by: Grant Grundler <grundler@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-06-11 20:43:48 +08:00
Hangyu Hua	275527ef00	yam: fix a memory leak in yam_siocdevprivate() [ Upstream commit `29eb315427` ] ym needs to be free when ym->cmd != SIOCYAMSMCS. Fixes: `0781168e23` ("yam: fix a missing-check bug") Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-06-11 20:43:48 +08:00
Duoming Zhou	f889a7bec0	drivers: hamradio: 6pack: fix UAF bug caused by mod_timer() commit `efe4186e6a` upstream. When a 6pack device is detaching, the sixpack_close() will act to cleanup necessary resources. Although del_timer_sync() in sixpack_close() won't return if there is an active timer, one could use mod_timer() in sp_xmit_on_air() to wake up timer again by calling userspace syscall such as ax25_sendmsg(), ax25_connect() and ax25_ioctl(). This unexpected waked handler, sp_xmit_on_air(), realizes nothing about the undergoing cleanup and may still call pty_write() to use driver layer resources that have already been released. One of the possible race conditions is shown below: (USE) \| (FREE) ax25_sendmsg() \| ax25_queue_xmit() \| ... \| sp_xmit() \| sp_encaps() \| sixpack_close() sp_xmit_on_air() \| del_timer_sync(&sp->tx_t) mod_timer(&sp->tx_t,...) \| ... \| unregister_netdev() \| ... (wait a while) \| tty_release() \| tty_release_struct() \| release_tty() sp_xmit_on_air() \| tty_kref_put(tty_struct) //FREE pty_write(tty_struct) //USE \| ... The corresponding fail log is shown below: =============================================================== BUG: KASAN: use-after-free in __run_timers.part.0+0x170/0x470 Write of size 8 at addr ffff88800a652ab8 by task swapper/2/0 ... Call Trace: ... queue_work_on+0x3f/0x50 pty_write+0xcd/0xe0pty_write+0xcd/0xe0 sp_xmit_on_air+0xb2/0x1f0 call_timer_fn+0x28/0x150 __run_timers.part.0+0x3c2/0x470 run_timer_softirq+0x3b/0x80 __do_softirq+0xf1/0x380 ... This patch reorders the del_timer_sync() after the unregister_netdev() to avoid UAF bugs. Because the unregister_netdev() is well synchronized, it flushs out any pending queues, waits the refcount of net_device decreases to zero and removes net_device from kernel. There is not any running routines after executing unregister_netdev(). Therefore, we could not arouse timer from userspace again. Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reviewed-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-06-11 20:43:48 +08:00
Zekun Shen	ab4b976f52	mwifiex: Fix skb_over_panic in mwifiex_usb_recv() [ Upstream commit `04d80663f6` ] Currently, with an unknown recv_type, mwifiex_usb_recv just return -1 without restoring the skb. Next time mwifiex_usb_rx_complete is invoked with the same skb, calling skb_put causes skb_over_panic. The bug is triggerable with a compromised/malfunctioning usb device. After applying the patch, skb_over_panic no longer shows up with the same input. Attached is the panic report from fuzzing. skbuff: skb_over_panic: text:000000003bf1b5fa len:2048 put:4 head:00000000dd6a115b data:000000000a9445d8 tail:0x844 end:0x840 dev:<NULL> kernel BUG at net/core/skbuff.c:109! invalid opcode: 0000 [#1] SMP KASAN NOPTI CPU: 0 PID: 198 Comm: in:imklog Not tainted 5.6.0 #60 RIP: 0010:skb_panic+0x15f/0x161 Call Trace: <IRQ> ? mwifiex_usb_rx_complete+0x26b/0xfcd [mwifiex_usb] skb_put.cold+0x24/0x24 mwifiex_usb_rx_complete+0x26b/0xfcd [mwifiex_usb] __usb_hcd_giveback_urb+0x1e4/0x380 usb_giveback_urb_bh+0x241/0x4f0 ? __hrtimer_run_queues+0x316/0x740 ? __usb_hcd_giveback_urb+0x380/0x380 tasklet_action_common.isra.0+0x135/0x330 __do_softirq+0x18c/0x634 irq_exit+0x114/0x140 smp_apic_timer_interrupt+0xde/0x380 apic_timer_interrupt+0xf/0x20 </IRQ> Reported-by: Brendan Dolan-Gavitt <brendandg@nyu.edu> Signed-off-by: Zekun Shen <bruceshenzk@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/YX4CqjfRcTa6bVL+@Zekuns-MBP-16.fios-router.home Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-06-11 20:43:47 +08:00
Juergen Gross	0e3657c619	xen/netback: don't queue unlimited number of packages commit `be81992f90` upstream. In case a guest isn't consuming incoming network traffic as fast as it is coming in, xen-netback is buffering network packages in unlimited numbers today. This can result in host OOM situations. Commit `f48da8b14d` ("xen-netback: fix unlimited guest Rx internal queue and carrier flapping") meant to introduce a mechanism to limit the amount of buffered data by stopping the Tx queue when reaching the data limit, but this doesn't work for cases like UDP. When hitting the limit don't queue further SKBs, but drop them instead. In order to be able to tell Rx packages have been dropped increment the rx_dropped statistics counter in this case. It should be noted that the old solution to continue queueing SKBs had the additional problem of an overflow of the 32-bit rx_queue_len value would result in intermittent Tx queue enabling. This is part of XSA-392 Fixes: `f48da8b14d` ("xen-netback: fix unlimited guest Rx internal queue and carrier flapping") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-06-11 20:43:47 +08:00
Juergen Gross	d6cc3dbf63	xen/netback: fix rx queue stall detection commit `6032046ec4` upstream. Commit `1d5d485239` ("xen-netback: require fewer guest Rx slots when not using GSO") introduced a security problem in netback, as an interface would only be regarded to be stalled if no slot is available in the rx queue ring page. In case the SKB at the head of the queued requests will need more than one rx slot and only one slot is free the stall detection logic will never trigger, as the test for that is only looking for at least one slot to be free. Fix that by testing for the needed number of slots instead of only one slot being available. In order to not have to take the rx queue lock that often, store the number of needed slots in the queue data. As all SKB dequeue operations happen in the rx queue kernel thread this is safe, as long as the number of needed slots is accessed via READ/WRITE_ONCE() only and updates are always done with the rx queue lock held. Add a small helper for obtaining the number of free slots. This is part of XSA-392 Fixes: `1d5d485239` ("xen-netback: require fewer guest Rx slots when not using GSO") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-06-11 20:43:47 +08:00
Haimin Zhang	6f5c48db33	netdevsim: Zero-initialize memory for new map's value in function nsim_bpf_map_alloc [ Upstream commit `481221775d` ] Zero-initialize memory for new map's value in function nsim_bpf_map_alloc since it may cause a potential kernel information leak issue, as follows: 1. nsim_bpf_map_alloc calls nsim_map_alloc_elem to allocate elements for a new map. 2. nsim_map_alloc_elem uses kmalloc to allocate map's value, but doesn't zero it. 3. A user application can use IOCTL BPF_MAP_LOOKUP_ELEM to get specific element's information in the map. 4. The kernel function map_lookup_elem will call bpf_map_copy_value to get the information allocated at step-2, then use copy_to_user to copy to the user buffer. This can only leak information for an array map. Fixes: `395cacb5f1` ("netdevsim: bpf: support fake map offload") Suggested-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Haimin Zhang <tcs.kernel@gmail.com> Link: https://lore.kernel.org/r/20211215111530.72103-1-tcs.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Honglin Li <honglinli@tencent.com>	2024-06-11 20:43:46 +08:00
Juergen Gross	5a69765bc9	x86: Clear .brk area at early boot commit `38fa5479b4` upstream fix CVE-2022-36123 The .brk section has the same properties as .bss: it is an alloc-only section and should be cleared before being used. Not doing so is especially a problem for Xen PV guests, as the hypervisor will validate page tables (check for writable page tables and hypervisor private bits) before accepting them to be used. Make sure .brk is initially zero by letting clear_bss() clear the brk area, too. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220630071441.28576-3-jgross@suse.com Signed-off-by: Xinghui Li <korantli@tencent.com>	2024-06-11 20:41:50 +08:00
Vitaly Kuznetsov	439f3a0e9a	KVM: x86: Avoid theoretical NULL pointer dereference in kvm_irq_delivery_to_apic_fast() commit `00b5f37189` upstream fix CVE-2022-2153 When kvm_irq_delivery_to_apic_fast() is called with APIC_DEST_SELF shorthand, 'src' must not be NULL. Crash the VM with KVM_BUG_ON() instead of crashing the host. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220325132140.25650-3-vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Xinghui Li <korantli@tencent.com>	2024-06-11 20:41:49 +08:00
Sean Christopherson	7d79dbeffa	KVM: Add infrastructure and macro to mark VM as bugged commit `0b8f11737c` upstream CVE-2022-2153 dependence Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <3a0998645c328bf0895f1290e61821b70f048549.1625186503.git.isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Xinghui Li <korantli@tencent.com>	2024-06-11 20:41:49 +08:00
Xinghui Li	2b5399ae57	KVM: x86/mmu: fix NULL pointer dereference on guest INVPCID commit `9f46c187e2` upstream. fix CVE-2022-1789 With shadow paging enabled, the INVPCID instruction results in a call to kvm_mmu_invpcid_gva. If INVPCID is executed with CR0.PG=0, the invlpg callback is not set and the result is a NULL pointer dereference. Fix it trivially by checking for mmu->invlpg before every call. There are other possibilities: - check for CR0.PG, because KVM (like all Intel processors after P5) flushes guest TLB on CR0.PG changes so that INVPCID/INVLPG are a nop with paging disabled - check for EFER.LMA, because KVM syncs and flushes when switching MMU contexts outside of 64-bit mode All of these are tricky, go for the simple solution. This is CVE-2022-1789. Reported-by: Yongkang Jia <kangel@zju.edu.cn> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Xinghui Li <korantli@tencent.com>	2024-06-11 20:41:49 +08:00
Naoya Horiguchi	71bc658c55	mm/hwpoison: fix error page recovered but reported "not recovered" commit `046545a661` upstream. When an uncorrected memory error is consumed there is a race between the CMCI from the memory controller reporting an uncorrected error with a UCNA signature, and the core reporting and SRAR signature machine check when the data is about to be consumed. If the CMCI wins that race, the page is marked poisoned when uc_decode_notifier() calls memory_failure() and the machine check processing code finds the page already poisoned. It calls kill_accessing_process() to make sure a SIGBUS is sent. But returns the wrong error code. Console log looks like this: [34775.674296] mce: Uncorrected hardware memory error in user-access at 3710b3400 [34775.675413] Memory failure: 0x3710b3: recovery action for dirty LRU page: Recovered [34775.690310] Memory failure: 0x3710b3: already hardware poisoned [34775.696247] Memory failure: 0x3710b3: Sending SIGBUS to einj_mem_uc:361438 due to hardware memory corruption [34775.706072] mce: Memory error not recovered kill_accessing_process() is supposed to return -EHWPOISON to notify that SIGBUS is already set to the process and kill_me_maybe() doesn't have to send it again. But current code simply fails to do this, so fix it to make sure to work as intended. This change avoids the noise message "Memory error not recovered" and skips duplicate SIGBUSs. [tony.luck@intel.com: reword some parts of commit message] Link: https://lkml.kernel.org/r/20220113231117.1021405-1-naoya.horiguchi@linux.dev Fixes: `a3f5d80ea4` ("mm,hwpoison: send SIGBUS with error virutal address") Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reported-by: Youquan Song <youquan.song@intel.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: caelli <caelli@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:41:48 +08:00
Youquan Song	cbb0d67ba2	x86/mce: Reduce number of machine checks taken during recovery commit `3376136300` upstream. When any of the copy functions in arch/x86/lib/copy_user_64.S take a fault, the fixup code copies the remaining byte count from %ecx to %edx and unconditionally jumps to .Lcopy_user_handle_tail to continue the copy in case any more bytes can be copied. If the fault was #PF this may copy more bytes (because the page fault handler might have fixed the fault). But when the fault is a machine check the original copy code will have copied all the way to the poisoned cache line. So .Lcopy_user_handle_tail will just take another machine check for no good reason. Every code path to .Lcopy_user_handle_tail comes from an exception fixup path, so add a check there to check the trap type (in %eax) and simply return the count of remaining bytes if the trap was a machine check. Doing this reduces the number of machine checks taken during synthetic tests from four to three. As well as reducing the number of machine checks, this also allows Skylake generation Xeons to recover some cases that currently fail. The is because REP; MOVSB is only recoverable when source and destination are well aligned and the byte count is large. That useless call to .Lcopy_user_handle_tail may violate one or more of these conditions and generate a fatal machine check. [ Tony: Add more details to commit message. ] [ bp: Fixup comment. Also, another tip patchset which is adding straight-line speculation mitigation changes the "ret" instruction to an all-caps macro "RET". But, since gas is case-insensitive, use "RET" in the newly added asm block already in order to simplify tip branch merging on its way upstream. ] Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/YcTW5dh8yTGucDd+@agluck-desk2.amr.corp.intel.com Signed-off-by: caelli <caelli@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:41:48 +08:00
Al Viro	1e8310cc49	Intel: generic_perform_write()/iomap_write_actor(): saner logics for short copy commit `bc1bb416bb` upstream if we run into a short copy and ->write_end() refuses to advance at all, use the amount we'd managed to copy for the next iteration to handle. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: caelli <caelli@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:41:48 +08:00
Tony Luck	2c067f2ca8	Intel: x86/mce: Drop copyin special case for #MC commit `690658471b` upstream Fixes to the iterator code to handle faults that are not on page boundaries mean that the special case for machine check during copy from user is no longer needed. For a full list of those fixes, see the output of: git log --oneline v5.14 ^v5.13 -- lib/iov_iter.c Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210818002942.1607544-4-tony.luck@intel.com Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: caelli <caelli@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:41:47 +08:00
Tony Luck	d1eba1ab3b	Intel: x86/mce: Change to not send SIGBUS error during copy from user commit `a6e3cf70b7` upstream Sending a SIGBUS for a copy from user is not the correct semantic. System calls should return -EFAULT (or a short count for write(2)). Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210818002942.1607544-3-tony.luck@intel.com Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: caelli <caelli@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:41:47 +08:00
Tony Luck	992118e2e2	Intel: x86/mce: Avoid infinite loop for copy from user recovery commit `81065b35e2` upstream There are two cases for machine check recovery: 1) The machine check was triggered by ring3 (application) code. This is the simpler case. The machine check handler simply queues work to be executed on return to user. That code unmaps the page from all users and arranges to send a SIGBUS to the task that triggered the poison. 2) The machine check was triggered in kernel code that is covered by an exception table entry. In this case the machine check handler still queues a work entry to unmap the page, etc. but this will not be called right away because the #MC handler returns to the fix up code address in the exception table entry. Problems occur if the kernel triggers another machine check before the return to user processes the first queued work item. Specifically, the work is queued using the ->mce_kill_me callback structure in the task struct for the current thread. Attempting to queue a second work item using this same callback results in a loop in the linked list of work functions to call. So when the kernel does return to user, it enters an infinite loop processing the same entry for ever. There are some legitimate scenarios where the kernel may take a second machine check before returning to the user. 1) Some code (e.g. futex) first tries a get_user() with page faults disabled. If this fails, the code retries with page faults enabled expecting that this will resolve the page fault. 2) Copy from user code retries a copy in byte-at-time mode to check whether any additional bytes can be copied. On the other side of the fence are some bad drivers that do not check the return value from individual get_user() calls and may access multiple user addresses without noticing that some/all calls have failed. Fix by adding a counter (current->mce_count) to keep track of repeated machine checks before task_work() is called. First machine check saves the address information and calls task_work_add(). Subsequent machine checks before that task_work call back is executed check that the address is in the same page as the first machine check (since the callback will offline exactly one page). Expected worst case is four machine checks before moving on (e.g. one user access with page faults disabled, then a repeat to the same address with page faults enabled ... repeat in copy tail bytes). Just in case there is some code that loops forever enforce a limit of 10. [ bp: Massage commit message, drop noinstr, fix typo, extend panic messages. ] Fixes: `5567d11c21` ("x86/mce: Send #MC singal from task work") Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/YT/IJ9ziLqmtqEPu@agluck-desk2.amr.corp.intel.com Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: caelli <caelli@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:41:46 +08:00
Youquan Song	fb6714a362	Intel: EDAC/i10nm: Retrieve and print retry_rd_err_log registers commit `cf4e6d52f5` upstream Retrieve and print retry_rd_err_log registers like the earlier change: commit `e80634a75a` ("EDAC, skx: Retrieve and print retry_rd_err_log registers") This is a little trickier than on Skylake because of potential interference with BIOS use of the same registers. The default behavior is to ignore these registers. A module parameter retry_rd_err_log(default=0) controls the mode of operation: - 0=off : Default. - 1=bios : Linux doesn't reset any control bits, but just reports values. This is "no harm" mode, but it may miss reporting some data. - 2=linux: Linux tries to take control and resets mode bits, clears valid/UC bits after reading. This should be more reliable (especially if BIOS interference is reduced by disabling eMCA reporting mode in BIOS setup). Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210818175701.1611513-3-tony.luck@intel.com Signed-off-by: caelli <caelli@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:41:46 +08:00
Youquan Song	0e9e08f722	Intel: Fix backport issue for MCA recovery compare to down_read_killable, down_read is better and closer to upstream newer function mmap_read_lock semantic. The issues backporting patch: mm,hwpoison: send SIGBUS with error virutal address commit `a3f5d80ea4` upstream Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: caelli <caelli@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:41:46 +08:00
Naoya Horiguchi	45563d00c6	mm,hwpoison: send SIGBUS with error virutal address commit `a3f5d80ea4` upstream Now an action required MCE in already hwpoisoned address surely sends a SIGBUS to current process, but the SIGBUS doesn't convey error virtual address. That's not optimal for hwpoison-aware applications. To fix the issue, make memory_failure() call kill_accessing_process(), that does pagetable walk to find the error virtual address. It could find multiple virtual addresses for the same error page, and it seems hard to tell which virtual address is correct one. But that's rare and sending incorrect virtual address could be better than no address. So let's report the first found virtual address for now. [naoya.horiguchi@nec.com: fix walk_page_range() return] Link: https://lkml.kernel.org/r/20210603051055.GA244241@hori.linux.bs1.fc.nec.co.jp Link: https://lkml.kernel.org/r/20210521030156.2612074-4-nao.horiguchi@gmail.com Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Aili Yao <yaoaili@kingsoft.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jue Wang <juew@google.com> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: caelli <caelli@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:41:45 +08:00
Naoya Horiguchi	082f537775	mm/hwpoison: do not lock page again when me_huge_page() successfully recovers commit `ea6d063010` upstream Currently me_huge_page() temporary unlocks page to perform some actions then locks it again later. My testcase (which calls hard-offline on some tail page in a hugetlb, then accesses the address of the hugetlb range) showed that page allocation code detects this page lock on buddy page and printed out "BUG: Bad page state" message. check_new_page_bad() does not consider a page with __PG_HWPOISON as bad page, so this flag works as kind of filter, but this filtering doesn't work in this case because the "bad page" is not the actual hwpoisoned page. So stop locking page again. Actions to be taken depend on the page type of the error, so page unlocking should be done in ->action() callbacks. So let's make it assumed and change all existing callbacks that way. Link: https://lkml.kernel.org/r/20210609072029.74645-1-nao.horiguchi@gmail.com Fixes: commit `78bb920344` ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error") Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Tony Luck <tony.luck@intel.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: caelli <caelli@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:41:45 +08:00
Aili Yao	35678e206a	mm,hwpoison: return -EHWPOISON to denote that the page has already been poisoned commit `47af12bae1` upstream When memory_failure() is called with MF_ACTION_REQUIRED on the page that has already been hwpoisoned, memory_failure() could fail to send SIGBUS to the affected process, which results in infinite loop of MCEs. Currently memory_failure() returns 0 if it's called for already hwpoisoned page, then the caller, kill_me_maybe(), could return without sending SIGBUS to current process. An action required MCE is raised when the current process accesses to the broken memory, so no SIGBUS means that the current process continues to run and access to the error page again soon, so running into MCE loop. This issue can arise for example in the following scenarios: - Two or more threads access to the poisoned page concurrently. If local MCE is enabled, MCE handler independently handles the MCE events. So there's a race among MCE events, and the second or latter threads fall into the situation in question. - If there was a precedent memory error event and memory_failure() for the event failed to unmap the error page for some reason, the subsequent memory access to the error page triggers the MCE loop situation. To fix the issue, make memory_failure() return an error code when the error page has already been hwpoisoned. This allows memory error handler to control how it sends signals to userspace. And make sure that any process touching a hwpoisoned page should get a SIGBUS even in "already hwpoisoned" path of memory_failure() as is done in page fault path. Link: https://lkml.kernel.org/r/20210521030156.2612074-3-nao.horiguchi@gmail.com Signed-off-by: Aili Yao <yaoaili@kingsoft.com> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Jue Wang <juew@google.com> Cc: Tony Luck <tony.luck@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: caelli <caelli@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:41:45 +08:00
Tony Luck	70d3d660fa	mm/memory-failure: use a mutex to avoid memory_failure() races commit `171936ddaf` upstream Patch series "mm,hwpoison: fix sending SIGBUS for Action Required MCE", v5. I wrote this patchset to materialize what I think is the current allowable solution mentioned by the previous discussion [1]. I simply borrowed Tony's mutex patch and Aili's return code patch, then I queued another one to find error virtual address in the best effort manner. I know that this is not a perfect solution, but should work for some typical case. [1]: https://lore.kernel.org/linux-mm/20210331192540.2141052f@alex-virtual-machine/ This patch (of 2): There can be races when multiple CPUs consume poison from the same page. The first into memory_failure() atomically sets the HWPoison page flag and begins hunting for tasks that map this page. Eventually it invalidates those mappings and may send a SIGBUS to the affected tasks. But while all that work is going on, other CPUs see a "success" return code from memory_failure() and so they believe the error has been handled and continue executing. Fix by wrapping most of the internal parts of memory_failure() in a mutex. [akpm@linux-foundation.org: make mf_mutex local to memory_failure()] Link: https://lkml.kernel.org/r/20210521030156.2612074-1-nao.horiguchi@gmail.com Link: https://lkml.kernel.org/r/20210521030156.2612074-2-nao.horiguchi@gmail.com Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Aili Yao <yaoaili@kingsoft.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Hildenbrand <david@redhat.com> Cc: Jue Wang <juew@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: caelli <caelli@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:41:44 +08:00
jschoenh	b6100000f2	x86/mce: Take action on UCNA/Deferred errors again commit `8438b84ab4` upstream Commit `fa92c58694` ("x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll") added handling of UCNA and Deferred errors by adding them to the ring for SRAO errors. Later, commit `fd4cf79fcc` ("x86/mce: Remove the MCE ring for Action Optional errors") switched storage from the SRAO ring to the unified pool that is still in use today. In order to only act on the intended errors, a filter for MCE_AO_SEVERITY is used -- effectively removing handling of UCNA/Deferred errors again. Extend the severity filter to include UCNA/Deferred errors again. Also, generalize the naming of the notifier from SRAO to UC to capture the extended scope. Note, that this change may cause a message like the following to appear, as the same address may be reported as SRAO and as UCNA: Memory failure: 0x5fe3284: already hardware poisoned Technically, this is a return to previous behavior. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200103150722.20313-2-jschoenh@amazon.de Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: caelli <caelli@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:41:44 +08:00
Petr Mladek	ca40e0017d	printk: Prepare for nested printk_nmi_enter() commit `8c4e93c362` upstream There is plenty of space in the printk_context variable. Reserve one byte there for the NMI context to be on the safe side. It should never overflow. The BUG_ON(in_nmi() == NMI_MASK) in nmi_enter() will trigger much earlier. Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Link: https://lkml.kernel.org/r/20200505134100.681374113@linutronix.de Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: caelli <caelli@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com>	2024-06-11 20:41:44 +08:00
Jiri Slaby	2bb32956d6	tty: use new tty_insert_flip_string_and_push_buffer() in pty_write() commit `a501ab75e7` upstream. There is a race in pty_write(). pty_write() can be called in parallel with e.g. ioctl(TIOCSTI) or ioctl(TCXONC) which also inserts chars to the buffer. Provided, tty_flip_buffer_push() in pty_write() is called outside the lock, it can commit inconsistent tail. This can lead to out of bounds writes and other issues. See the Link below. To fix this, we have to introduce a new helper called tty_insert_flip_string_and_push_buffer(). It does both tty_insert_flip_string() and tty_flip_buffer_commit() under the port lock. It also calls queue_work(), but outside the lock. See `71a174b39f` (pty: do tty_flip_buffer_push without port->lock in pty_write) for the reasons. Keep the helper internal-only (in drivers' tty.h). It is not intended to be used widely. Link: https://seclists.org/oss-sec/2022/q2/155 Fixes: `71a174b39f` (pty: do tty_flip_buffer_push without port->lock in pty_write) Cc: 一只狗 <chennbnbnb@gmail.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Suggested-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20220707082558.9250-2-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tao Wu <tallwu@tencent.com> Reviewed-by: Alex Shi <alexsshi@tencent.com>	2024-06-11 20:41:43 +08:00
Jiri Slaby	fd4ff95a7f	tty: extract tty_flip_buffer_commit() from tty_flip_buffer_push() commit `716b105802` upstream. We will need this new helper in the next patch. Cc: Hillf Danton <hdanton@sina.com> Cc: 一只狗 <chennbnbnb@gmail.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20220707082558.9250-1-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tao Wu <tallwu@tencent.com> Reviewed-by: Alex Shi <alexsshi@tencent.com>	2024-06-11 20:41:43 +08:00
Jiri Slaby	1dad12ddd9	tty: drop tty_schedule_flip() commit `5db96ef23b` upstream. Since commit `a9c3f68f3c` (tty: Fix low_latency BUG) in 2014, tty_flip_buffer_push() is only a wrapper to tty_schedule_flip(). All users were converted in the previous patches, so remove tty_schedule_flip() completely while inlining its body into tty_flip_buffer_push(). One less exported function. For kabi compatibility, we don't drop tty_schedule_flip(), only modify tty_flip_buffer_push() in our tk4. Reviewed-by: Johan Hovold <johan@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20211122111648.30379-4-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tao Wu <tallwu@tencent.com> Reviewed-by: Alex Shi <alexsshi@tencent.com>	2024-06-11 20:41:42 +08:00
Michael Ellerman	0d939f0b69	powerpc/32: Fix overread/overwrite of thread_struct via ptrace commit `8e12784444` upstream. powerpc/32: Fix overread/overwrite of thread_struct via ptrace The ptrace PEEKUSR/POKEUSR (aka PEEKUSER/POKEUSER) API allows a process to read/write registers of another process. To get/set a register, the API takes an index into an imaginary address space called the "USER area", where the registers of the process are laid out in some fashion. The kernel then maps that index to a particular register in its own data structures and gets/sets the value. The API only allows a single machine-word to be read/written at a time. So 4 bytes on 32-bit kernels and 8 bytes on 64-bit kernels. The way floating point registers (FPRs) are addressed is somewhat complicated, because double precision float values are 64-bit even on 32-bit CPUs. That means on 32-bit kernels each FPR occupies two word-sized locations in the USER area. On 64-bit kernels each FPR occupies one word-sized location in the USER area. Internally the kernel stores the FPRs in an array of u64s, or if VSX is enabled, an array of pairs of u64s where one half of each pair stores the FPR. Which half of the pair stores the FPR depends on the kernel's endianness. To handle the different layouts of the FPRs depending on VSX/no-VSX and big/little endian, the TS_FPR() macro was introduced. Unfortunately the TS_FPR() macro does not take into account the fact that the addressing of each FPR differs between 32-bit and 64-bit kernels. It just takes the index into the "USER area" passed from userspace and indexes into the fp_state.fpr array. On 32-bit there are 64 indexes that address FPRs, but only 32 entries in the fp_state.fpr array, meaning the user can read/write 256 bytes past the end of the array. Because the fp_state sits in the middle of the thread_struct there are various fields than can be overwritten, including some pointers. As such it may be exploitable. It has also been observed to cause systems to hang or otherwise misbehave when using gdbserver, and is probably the root cause of this report which could not be easily reproduced: https://lore.kernel.org/linuxppc-dev/dc38afe9-6b78-f3f5-666b-986939e40fc6@keymile.com/ Rather than trying to make the TS_FPR() macro even more complicated to fix the bug, or add more macros, instead add a special-case for 32-bit kernels. This is more obvious and hopefully avoids a similar bug happening again in future. Note that because 32-bit kernels never have VSX enabled the code doesn't need to consider TS_FPRWIDTH/OFFSET at all. Add a BUILD_BUG_ON() to ensure that 32-bit && VSX is never enabled. Fixes: `87fec0514f` ("powerpc: PTRACE_PEEKUSR/PTRACE_POKEUSER of FPR registers in little endian builds") Cc: stable@vger.kernel.org # v3.13+ Reported-by: Ariel Miculas <ariel.miculas@belden.com> Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220609133245.573565-1-mpe@ellerman.id.au Signed-off-by: Tao Wu <tallwu@tencent.com> Reviewed-by: Alex Shi <alexsshi@tencent.com>	2024-06-11 20:41:42 +08:00
Sarthak Kukreti	0623c2c949	dm verity: set DM_TARGET_IMMUTABLE feature flag commit `4caae58406` upstream. The device-mapper framework provides a mechanism to mark targets as immutable (and hence fail table reloads that try to change the target type). Add the DM_TARGET_IMMUTABLE flag to the dm-verity target's feature flags to prevent switching the verity target with a different target type. Fixes: `a4ffc15219` ("dm: add verity target") Cc: stable@vger.kernel.org Signed-off-by: Sarthak Kukreti <sarthakkukreti@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Tao Wu <tallwu@tencent.com> Reviewed-by: Alex Shi <alexsshi@tencent.com>	2024-06-11 20:41:42 +08:00
Eric Snowberg	5df276cf79	lockdown: Fix kexec lockdown bypass with ima policy commit `543ce63b66` upstream. The lockdown LSM is primarily used in conjunction with UEFI Secure Boot. This LSM may also be used on machines without UEFI. It can also be enabled when UEFI Secure Boot is disabled. One of lockdown's features is to prevent kexec from loading untrusted kernels. Lockdown can be enabled through a bootparam or after the kernel has booted through securityfs. If IMA appraisal is used with the "ima_appraise=log" boot param, lockdown can be defeated with kexec on any machine when Secure Boot is disabled or unavailable. IMA prevents setting "ima_appraise=log" from the boot param when Secure Boot is enabled, but this does not cover cases where lockdown is used without Secure Boot. To defeat lockdown, boot without Secure Boot and add ima_appraise=log to the kernel command line; then: $ echo "integrity" > /sys/kernel/security/lockdown $ echo "appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig" > \ /sys/kernel/security/ima/policy $ kexec -ls unsigned-kernel Add a call to verify ima appraisal is set to "enforce" whenever lockdown is enabled. This fixes CVE-2022-21505. Cc: stable@vger.kernel.org Fixes: `29d3c1c8df` ("kexec: Allow kexec_file() with appropriate IMA policy when locked down") Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com> Acked-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: John Haxby <john.haxby@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Tao Wu <tallwu@tencent.com> Reviewed-by: Alex Shi <alexsshi@tencent.com>	2024-06-11 20:41:41 +08:00
Duoming Zhou	30e33d2310	nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs commit `d270453a0d` upstream. There are destructive operations such as nfcmrvl_fw_dnld_abort and gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware, gpio and so on could be destructed while the upper layer functions such as nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads to double-free, use-after-free and null-ptr-deref bugs. There are three situations that could lead to double-free bugs. The first situation is shown below: (Thread 1) \| (Thread 2) nfcmrvl_fw_dnld_start \| ... \| nfcmrvl_nci_unregister_dev release_firmware() \| nfcmrvl_fw_dnld_abort kfree(fw) //(1) \| fw_dnld_over \| release_firmware ... \| kfree(fw) //(2) \| ... The second situation is shown below: (Thread 1) \| (Thread 2) nfcmrvl_fw_dnld_start \| ... \| mod_timer \| (wait a time) \| fw_dnld_timeout \| nfcmrvl_nci_unregister_dev fw_dnld_over \| nfcmrvl_fw_dnld_abort release_firmware \| fw_dnld_over kfree(fw) //(1) \| release_firmware ... \| kfree(fw) //(2) The third situation is shown below: (Thread 1) \| (Thread 2) nfcmrvl_nci_recv_frame \| if(..->fw_download_in_progress)\| nfcmrvl_fw_dnld_recv_frame \| queue_work \| \| fw_dnld_rx_work \| nfcmrvl_nci_unregister_dev fw_dnld_over \| nfcmrvl_fw_dnld_abort release_firmware \| fw_dnld_over kfree(fw) //(1) \| release_firmware \| kfree(fw) //(2) The firmware struct is deallocated in position (1) and deallocated in position (2) again. The crash trace triggered by POC is like below: BUG: KASAN: double-free or invalid-free in fw_dnld_over Call Trace: kfree fw_dnld_over nfcmrvl_nci_unregister_dev nci_uart_tty_close tty_ldisc_kill tty_ldisc_hangup __tty_hangup.part.0 tty_release ... What's more, there are also use-after-free and null-ptr-deref bugs in nfcmrvl_fw_dnld_start. If we deallocate firmware struct, gpio or set null to the members of priv->fw_dnld in nfcmrvl_nci_unregister_dev, then, we dereference firmware, gpio or the members of priv->fw_dnld in nfcmrvl_fw_dnld_start, the UAF or NPD bugs will happen. This patch reorders destructive operations after nci_unregister_device in order to synchronize between cleanup routine and firmware download routine. The nci_unregister_device is well synchronized. If the device is detaching, the firmware download routine will goto error. If firmware download routine is executing, nci_unregister_device will wait until firmware download routine is finished. Fixes: `3194c68701` ("NFC: nfcmrvl: add firmware download support") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tao Wu <tallwu@tencent.com> Reviewed-by: Alex Shi <alexsshi@tencent.com>	2024-06-11 20:41:41 +08:00
Peter Zijlstra	1e450e34ef	perf: Fix sys_perf_event_open() race against self commit `3ac6487e58` upstream. Norbert reported that it's possible to race sys_perf_event_open() such that the looser ends up in another context from the group leader, triggering many WARNs. The move_group case checks for races against itself, but the !move_group case doesn't, seemingly relying on the previous group_leader->ctx == ctx check. However, that check is racy due to not holding any locks at that time. Therefore, re-check the result after acquiring locks and bailing if they no longer match. Additionally, clarify the not_move_group case from the move_group-vs-move_group race. Fixes: `f63a8daa58` ("perf: Fix event->ctx locking") Reported-by: Norbert Slusarek <nslusarek@gmx.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Tao Wu <tallwu@tencent.com> Reviewed-by: Alex Shi <alexsshi@tencent.com>	2024-06-11 20:41:41 +08:00
Daniel Thompson	5059594a10	lockdown: also lock down previous kgdb use commit `eadb2f47a3` upstream. KGDB and KDB allow read and write access to kernel memory, and thus should be restricted during lockdown. An attacker with access to a serial port (for example, via a hypervisor console, which some cloud vendors provide over the network) could trigger the debugger so it is important that the debugger respect the lockdown mode when/if it is triggered. Fix this by integrating lockdown into kdb's existing permissions mechanism. Unfortunately kgdb does not have any permissions mechanism (although it certainly could be added later) so, for now, kgdb is simply and brutally disabled by immediately exiting the gdb stub without taking any action. For lockdowns established early in the boot (e.g. the normal case) then this should be fine but on systems where kgdb has set breakpoints before the lockdown is enacted than "bad things" will happen. CVE: CVE-2022-21499 Co-developed-by: Stephen Brennan <stephen.s.brennan@oracle.com> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Tao Wu <tallwu@tencent.com> Reviewed-by: Alex Shi <alexsshi@tencent.com>	2024-06-11 20:41:40 +08:00
johnnyaiai	01b127db01	config/performance: Enable CONFIG_PREEMPT_NONE by default Signed-off-by: johnnyaiai <johnnyaiai@tencent.com> Reviewed-by: flyingpeng <flyingpeng@tencent.com>	2024-06-11 20:41:40 +08:00
johnnyaiai	4e616de0dc	config/performance: Disable PV_SPINLOCK by default Signed-off-by: johnnyaiai <johnnyaiai@tencent.com> Reviewed-by: flyingpeng <flyingpeng@tencent.com>	2024-06-11 20:41:39 +08:00
Juergen Gross	8881bf067a	xen/gnttab: fix gnttab_end_foreign_access() without page specified Commit `42baefac63` upstream. gnttab_end_foreign_access() is used to free a grant reference and optionally to free the associated page. In case the grant is still in use by the other side processing is being deferred. This leads to a problem in case no page to be freed is specified by the caller: the caller doesn't know that the page is still mapped by the other side and thus should not be used for other purposes. The correct way to handle this situation is to take an additional reference to the granted page in case handling is being deferred and to drop that reference when the grant reference could be freed finally. This requires that there are no users of gnttab_end_foreign_access() left directly repurposing the granted page after the call, as this might result in clobbered data or information leaks via the not yet freed grant reference. This is part of CVE-2022-23041 / XSA-396. Reported-by: Simon Gaiser <simon@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: samuelliao <samuelliao@tencent.com>	2024-06-11 20:41:39 +08:00
Juergen Gross	c9d4e35e5b	xen/pvcalls: use alloc/free_pages_exact() Commit `b0576cc9c6` upstream. Instead of __get_free_pages() and free_pages() use alloc_pages_exact() and free_pages_exact(). This is in preparation of a change of gnttab_end_foreign_access() which will prohibit use of high-order pages. This is part of CVE-2022-23041 / XSA-396. Reported-by: Simon Gaiser <simon@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: samuelliao <samuelliao@tencent.com>	2024-06-11 20:41:39 +08:00

1 2 3 4 5 ...

873586 Commits All Branches Search

873586 Commits

All Branches