OpenCloudOS-Kernel/arch/powerpc/lib
Nysal Jan K.A. d84ab6661e powerpc/qspinlock: Fix deadlock in MCS queue
commit 734ad0af3609464f8f93e00b6c0de1e112f44559 upstream.

If an interrupt occurs in queued_spin_lock_slowpath() after we increment
qnodesp->count and before node->lock is initialized, another CPU might
see stale lock values in get_tail_qnode(). If the stale lock value happens
to match the lock on that CPU, then we write to the "next" pointer of
the wrong qnode. This causes a deadlock as the former CPU, once it becomes
the head of the MCS queue, will spin indefinitely until it's "next" pointer
is set by its successor in the queue.

Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results in
occasional lockups similar to the following:

   $ stress-ng --all 128 --vm-bytes 80% --aggressive \
               --maximize --oomable --verify  --syslog \
               --metrics  --times  --timeout 5m

   watchdog: CPU 15 Hard LOCKUP
   ......
   NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490
   LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90
   Call Trace:
    0xc000002cfffa3bf0 (unreliable)
    _raw_spin_lock+0x6c/0x90
    raw_spin_rq_lock_nested.part.135+0x4c/0xd0
    sched_ttwu_pending+0x60/0x1f0
    __flush_smp_call_function_queue+0x1dc/0x670
    smp_ipi_demux_relaxed+0xa4/0x100
    xive_muxed_ipi_action+0x20/0x40
    __handle_irq_event_percpu+0x80/0x240
    handle_irq_event_percpu+0x2c/0x80
    handle_percpu_irq+0x84/0xd0
    generic_handle_irq+0x54/0x80
    __do_irq+0xac/0x210
    __do_IRQ+0x74/0xd0
    0x0
    do_IRQ+0x8c/0x170
    hardware_interrupt_common_virt+0x29c/0x2a0
   --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490
   ......
   NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490
   LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90
   --- interrupt: 500
    0xc0000029c1a41d00 (unreliable)
    _raw_spin_lock+0x6c/0x90
    futex_wake+0x100/0x260
    do_futex+0x21c/0x2a0
    sys_futex+0x98/0x270
    system_call_exception+0x14c/0x2f0
    system_call_vectored_common+0x15c/0x2ec

The following code flow illustrates how the deadlock occurs.
For the sake of brevity, assume that both locks (A and B) are
contended and we call the queued_spin_lock_slowpath() function.

        CPU0                                   CPU1
        ----                                   ----
  spin_lock_irqsave(A)                          |
  spin_unlock_irqrestore(A)                     |
    spin_lock(B)                                |
         |                                      |
         ▼                                      |
   id = qnodesp->count++;                       |
  (Note that nodes[0].lock == A)                |
         |                                      |
         ▼                                      |
      Interrupt                                 |
  (happens before "nodes[0].lock = B")          |
         |                                      |
         ▼                                      |
  spin_lock_irqsave(A)                          |
         |                                      |
         ▼                                      |
   id = qnodesp->count++                        |
   nodes[1].lock = A                            |
         |                                      |
         ▼                                      |
  Tail of MCS queue                             |
         |                             spin_lock_irqsave(A)
         ▼                                      |
  Head of MCS queue                             ▼
         |                             CPU0 is previous tail
         ▼                                      |
   Spin indefinitely                            ▼
  (until "nodes[1].next != NULL")      prev = get_tail_qnode(A, CPU0)
                                                |
                                                ▼
                                       prev == &qnodes[CPU0].nodes[0]
                                     (as qnodes[CPU0].nodes[0].lock == A)
                                                |
                                                ▼
                                       WRITE_ONCE(prev->next, node)
                                                |
                                                ▼
                                        Spin indefinitely
                                     (until nodes[0].locked == 1)

Thanks to Saket Kumar Bhaskar for help with recreating the issue

Fixes: 84990b1695 ("powerpc/qspinlock: add mcs queueing for contended waiters")
Cc: stable@vger.kernel.org # v6.2+
Reported-by: Geetika Moolchandani <geetika@linux.ibm.com>
Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com>
Reported-by: Jijo Varghese <vargjijo@in.ibm.com>
Signed-off-by: Nysal Jan K.A. <nysal@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240829022830.1164355-1-nysal@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-12 11:11:25 +02:00
..
Makefile powerpc: xor_vmx: Add '-mhard-float' to CFLAGS 2024-04-03 15:28:26 +02:00
checksum_32.S powerpc: replace #include <asm/export.h> with #include <linux/export.h> 2023-08-16 23:54:48 +10:00
checksum_64.S powerpc: replace #include <asm/export.h> with #include <linux/export.h> 2023-08-16 23:54:48 +10:00
checksum_wrappers.c net: unexport csum_and_copy_{from,to}_user 2022-04-29 14:37:59 -07:00
code-patching.c powerpc/code-patching: Fix oops with DEBUG_VM enabled 2022-12-16 23:59:43 +11:00
copy_32.S powerpc: replace #include <asm/export.h> with #include <linux/export.h> 2023-08-16 23:54:48 +10:00
copy_mc_64.S powerpc: replace #include <asm/export.h> with #include <linux/export.h> 2023-08-16 23:54:48 +10:00
copypage_64.S powerpc: replace #include <asm/export.h> with #include <linux/export.h> 2023-08-16 23:54:48 +10:00
copypage_power7.S powerpc: add CFUNC assembly label annotation 2023-04-20 12:54:24 +10:00
copyuser_64.S powerpc: replace #include <asm/export.h> with #include <linux/export.h> 2023-08-16 23:54:48 +10:00
copyuser_power7.S powerpc: add CFUNC assembly label annotation 2023-04-20 12:54:24 +10:00
crtsavres.S powerpc/64: Do not create new section for save/restore functions 2017-05-30 14:59:51 +10:00
div64.S treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
error-inject.c powerpc/64s: avoid reloading (H)SRR registers if they are still valid 2021-06-25 00:06:55 +10:00
feature-fixups-test.S powerpc: Test prefixed instructions in feature fixups 2020-05-19 00:11:02 +10:00
feature-fixups.c powerpc/features: Add capability to update mmu features later 2023-08-02 22:22:17 +10:00
hweight_64.S powerpc: replace #include <asm/export.h> with #include <linux/export.h> 2023-08-16 23:54:48 +10:00
ldstfp.S powerpc updates for 5.3 2019-07-13 16:08:36 -07:00
locks.c powerpc/pseries: Move some PAPR paravirt functions to their own file 2020-07-26 23:34:26 +10:00
mem_64.S powerpc: replace #include <asm/export.h> with #include <linux/export.h> 2023-08-16 23:54:48 +10:00
memcmp_32.S powerpc: replace #include <asm/export.h> with #include <linux/export.h> 2023-08-16 23:54:48 +10:00
memcmp_64.S powerpc: replace #include <asm/export.h> with #include <linux/export.h> 2023-08-16 23:54:48 +10:00
memcpy_64.S powerpc: replace #include <asm/export.h> with #include <linux/export.h> 2023-08-16 23:54:48 +10:00
memcpy_power7.S powerpc: add CFUNC assembly label annotation 2023-04-20 12:54:24 +10:00
pmem.c powerpc: Remove memcpy_page_flushcache() 2023-03-29 23:53:02 +11:00
qspinlock.c powerpc/qspinlock: Fix deadlock in MCS queue 2024-09-12 11:11:25 +02:00
quad.S treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
restart_table.c powerpc/64s: add a table of implicit soft-masked addresses 2021-06-30 22:21:20 +10:00
rheap.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
sstep.c powerpc/lib: Validate size for vector operations 2024-02-05 20:14:14 +00:00
string.S powerpc: replace #include <asm/export.h> with #include <linux/export.h> 2023-08-16 23:54:48 +10:00
string_32.S powerpc: replace #include <asm/export.h> with #include <linux/export.h> 2023-08-16 23:54:48 +10:00
string_64.S powerpc: replace #include <asm/export.h> with #include <linux/export.h> 2023-08-16 23:54:48 +10:00
strlen_32.S powerpc: replace #include <asm/export.h> with #include <linux/export.h> 2023-08-16 23:54:48 +10:00
test-code-patching.c powerpc/code-patching: Replace patch_instruction() by ppc_inst_write() in selftests 2021-12-23 22:36:58 +11:00
test_emulate_step.c powerpc/ppc-opcode: Define and use PPC_RAW_SETB() 2022-07-27 21:36:05 +10:00
test_emulate_step_exec_instr.S powerpc: add definition for pt_regs offset within an interrupt frame 2022-12-02 17:54:08 +11:00
vmx-helper.c powerpc: Fix reschedule bug in KUAP-unlocked user copy 2022-10-18 22:46:19 +11:00
xor_vmx.c lib/xor: make xor prototypes more friendly to compiler vectorization 2022-02-11 20:39:39 +11:00
xor_vmx.h lib/xor: make xor prototypes more friendly to compiler vectorization 2022-02-11 20:39:39 +11:00
xor_vmx_glue.c lib/xor: make xor prototypes more friendly to compiler vectorization 2022-02-11 20:39:39 +11:00