OpenCloudOS-Kernel

History

Frederic Weisbecker 4991cb2d43 rcu: Fix rcu_barrier() VS post CPUHP_TEARDOWN_CPU invocation [ Upstream commit 55d4669ef1b76823083caecfab12a8bd2ccdcf64 ] When rcu_barrier() calls rcu_rdp_cpu_online() and observes a CPU off rnp->qsmaskinitnext, it means that all accesses from the offline CPU preceding the CPUHP_TEARDOWN_CPU are visible to RCU barrier, including callbacks expiration and counter updates. However interrupts can still fire after stop_machine() re-enables interrupts and before rcutree_report_cpu_dead(). The related accesses happening between CPUHP_TEARDOWN_CPU and rnp->qsmaskinitnext clearing are _NOT_ guaranteed to be seen by rcu_barrier() without proper ordering, especially when callbacks are invoked there to the end, making rcutree_migrate_callback() bypass barrier_lock. The following theoretical race example can make rcu_barrier() hang: CPU 0 CPU 1 ----- ----- //cpu_down() smpboot_park_threads() //ksoftirqd is parked now <IRQ> rcu_sched_clock_irq() invoke_rcu_core() do_softirq() rcu_core() rcu_do_batch() // callback storm // rcu_do_batch() returns // before completing all // of them // do_softirq also returns early because of // timeout. It defers to ksoftirqd but // it's parked </IRQ> stop_machine() take_cpu_down() rcu_barrier() spin_lock(barrier_lock) // observes rcu_segcblist_n_cbs(&rdp->cblist) != 0 <IRQ> do_softirq() rcu_core() rcu_do_batch() //completes all pending callbacks //smp_mb() implied _after_ callback number dec </IRQ> rcutree_report_cpu_dead() rnp->qsmaskinitnext &= ~rdp->grpmask; rcutree_migrate_callback() // no callback, early return without locking // barrier_lock //observes !rcu_rdp_cpu_online(rdp) rcu_barrier_entrain() rcu_segcblist_entrain() // Observe rcu_segcblist_n_cbs(rsclp) == 0 // because no barrier between reading // rnp->qsmaskinitnext and rsclp->len rcu_segcblist_add_len() smp_mb__before_atomic() // will now observe the 0 count and empty // list, but too late, we enqueue regardless WRITE_ONCE(rsclp->len, rsclp->len + v); // ignored barrier callback // rcu barrier stall... This could be solved with a read memory barrier, enforcing the message passing between rnp->qsmaskinitnext and rsclp->len, matching the full memory barrier after rsclp->len addition in rcu_segcblist_add_len() performed at the end of rcu_do_batch(). However the rcu_barrier() is complicated enough and probably doesn't need too many more subtleties. CPU down is a slowpath and the barrier_lock seldom contended. Solve the issue with unconditionally locking the barrier_lock on rcutree_migrate_callbacks(). This makes sure that either rcu_barrier() sees the empty queue or its entrained callback will be migrated. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>		2024-08-14 13:58:41 +02:00
..
Kconfig	rcu: Employ jiffies-based backstop to callback time limit	2023-05-11 13:42:39 -07:00
Kconfig.debug	rcu: Allow up to five minutes expedited RCU CPU stall-warning timeouts	2023-01-09 12:09:52 -08:00
Makefile	rcuperf: Change rcuperf to rcuscale	2020-08-24 18:39:24 -07:00
rcu.h	rcu: Introduce rcu_cpu_online()	2024-01-10 17:16:56 +01:00
rcu_segcblist.c	rcu: Throttle callback invocation based on number of ready callbacks	2023-01-03 17:28:34 -08:00
rcu_segcblist.h	rcu: Throttle callback invocation based on number of ready callbacks	2023-01-03 17:28:34 -08:00
rcuscale.c	rcuscale: Move rcu_scale_writer() schedule_timeout_uninterruptible() to _idle()	2023-07-14 15:01:49 -07:00
rcutorture.c	rcutorture: Fix rcu_torture_fwd_cb_cr() data race	2024-08-14 13:58:41 +02:00
refscale.c	refscale: Add a "jiffies" test	2023-07-14 15:01:04 -07:00
srcutiny.c	rcu: Annotate SRCU's update-side lockdep dependencies	2023-03-27 11:15:59 -07:00
srcutree.c	srcu: Only accelerate on enqueue time	2023-11-28 17:19:36 +00:00
sync.c	rcu/sync: Use call_rcu_hurry() instead of call_rcu	2022-11-29 14:04:33 -08:00
tasks.h	rcu/tasks: Fix stale task snaphot for Tasks Trace	2024-08-03 08:53:20 +02:00
tiny.c	rcu: Refactor kvfree_call_rcu() and high-level helpers	2023-01-03 17:48:40 -08:00
tree.c	rcu: Fix rcu_barrier() VS post CPUHP_TEARDOWN_CPU invocation	2024-08-14 13:58:41 +02:00
tree.h	rcu/tree: Defer setting of jiffies during stall reset	2023-11-28 17:20:02 +00:00
tree_exp.h	rcu/exp: Handle RCU expedited grace period kworker allocation failure	2024-03-26 18:19:17 -04:00
tree_nocb.h	rcu/nocb: Fix WARN_ON_ONCE() in the rcu_nocb_bypass_lock()	2024-04-13 13:07:34 +02:00
tree_plugin.h	rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp	2023-05-11 13:42:39 -07:00
tree_stall.h	rcu: Fix buffer overflow in print_cpu_stall_info()	2024-06-12 11:11:32 +02:00
update.c	Merge branch 'stall.2023.01.09a' into HEAD	2023-02-02 16:40:07 -08:00