OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	3fe2f7446f	Changes in this cycle were: - Cleanups for SCHED_DEADLINE - Tracing updates/fixes - CPU Accounting fixes - First wave of changes to optimize the overhead of the scheduler build, from the fast-headers tree - including placeholder _api.h headers for later header split-ups. - Preempt-dynamic using static_branch() for ARM64 - Isolation housekeeping mask rework; preperatory for further changes - NUMA-balancing: deal with CPU-less nodes - NUMA-balancing: tune systems that have multiple LLC cache domains per node (eg. AMD) - Updates to RSEQ UAPI in preparation for glibc usage - Lots of RSEQ/selftests, for same - Add Suren as PSI co-maintainer Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmI5rg8RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hGrw/+M3QOk6fH7G48wjlNnBvcOife6ls+Ni4k ixOAcF4JKoixO8HieU5vv0A7yf/83tAa6fpeXeMf1hkCGc0NSlmLtuIux+WOmoAL LzCyDEYfiP8KnVh0A1Tui/lK0+AkGo21O6ADhQE2gh8o2LpslOHQMzvtyekSzeeb mVxMYQN+QH0m518xdO2D8IQv9ctOYK0eGjmkqdNfntOlytypPZHeNel/tCzwklP/ dElJUjNiSKDlUgTBPtL3DfpoLOI/0mHF2p6NEXvNyULxSOqJTu8pv9Z2ADb2kKo1 0D56iXBDngMi9MHIJLgvzsA8gKzHLFSuPbpODDqkTZCa28vaMB9NYGhJ643NtEie IXTJEvF1rmNkcLcZlZxo0yjL0fjvPkczjw4Vj27gbrUQeEBfb4mfuI4BRmij63Ep qEkgQTJhduCqqrQP1rVyhwWZRk1JNcVug+F6N42qWW3fg1xhj0YSrLai2c9nPez6 3Zt98H8YGS1Z/JQomSw48iGXVqfTp/ETI7uU7jqHK8QcjzQ4lFK5H4GZpwuqGBZi NJJ1l97XMEas+rPHiwMEN7Z1DVhzJLCp8omEj12QU+tGLofxxwAuuOVat3CQWLRk f80Oya3TLEgd22hGIKDRmHa22vdWnNQyS0S15wJotawBzQf+n3auS9Q3/rh979+t ES/qvlGxTIs= =Z8uT -----END PGP SIGNATURE----- Merge tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Cleanups for SCHED_DEADLINE - Tracing updates/fixes - CPU Accounting fixes - First wave of changes to optimize the overhead of the scheduler build, from the fast-headers tree - including placeholder _api.h headers for later header split-ups. - Preempt-dynamic using static_branch() for ARM64 - Isolation housekeeping mask rework; preperatory for further changes - NUMA-balancing: deal with CPU-less nodes - NUMA-balancing: tune systems that have multiple LLC cache domains per node (eg. AMD) - Updates to RSEQ UAPI in preparation for glibc usage - Lots of RSEQ/selftests, for same - Add Suren as PSI co-maintainer * tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits) sched/headers: ARM needs asm/paravirt_api_clock.h too sched/numa: Fix boot crash on arm64 systems headers/prep: Fix header to build standalone: <linux/psi.h> sched/headers: Only include <linux/entry-common.h> when CONFIG_GENERIC_ENTRY=y cgroup: Fix suspicious rcu_dereference_check() usage warning sched/preempt: Tell about PREEMPT_DYNAMIC on kernel headers sched/topology: Remove redundant variable and fix incorrect type in build_sched_domains sched/deadline,rt: Remove unused parameter from pick_next_[rt\|dl]_entity() sched/deadline,rt: Remove unused functions for !CONFIG_SMP sched/deadline: Use __node_2_[pdl\|dle]() and rb_first_cached() consistently sched/deadline: Merge dl_task_can_attach() and dl_cpu_busy() sched/deadline: Move bandwidth mgmt and reclaim functions into sched class source file sched/deadline: Remove unused def_dl_bandwidth sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE sched/tracing: Don't re-read p->state when emitting sched_switch event sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race sched/cpuacct: Remove redundant RCU read lock sched/cpuacct: Optimize away RCU read lock sched/cpuacct: Fix charge percpu cpuusage sched/headers: Reorganize, clean up and optimize kernel/sched/sched.h dependencies ...	2022-03-22 14:39:12 -07:00
Linus Torvalds	35dc0352bb	RCU pull request for v5.18 This pull request contains the following branches: exp.2022.02.24a: Contains a fix for idle detection from Neeraj Upadhyay and missing access marking detected by KCSAN. fixes.2022.02.14a: Miscellaneous fixes. rcu_barrier.2022.02.08a: Reduces coupling between rcu_barrier() and CPU-hotplug operations, so that rcu_barrier() no longer needs to do cpus_read_lock(). This may also someday allow system boot to bring CPUs online concurrently. rcu-tasks.2022.02.08a: Enable more aggressive movement to per-CPU queueing when reacting to excessive lock contention due to workloads placing heavy update-side stress on RCU tasks. rt.2022.02.01b: Improvements to RCU priority boosting, including changes from Neeraj Upadhyay, Zqiang, and Alison Chaiken. torture.2022.02.01b: Various fixes improving test robustness and debug information. torturescript.2022.02.08a: Add tests for SRCU size transitions, further compress torture.sh build products, and improve debug output. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmIusb0THHBhdWxtY2tA a2VybmVsLm9yZwAKCRCevxLzctn7jAklD/9VXLK7crcg2YeRXUIg1IOdnancsVCV MNtTfxNYqYIis+W2UfuHKuQu2yEXF5fihdY0J9TQv0byHsprp6FIZT+i1An4Ukgd 0vyHjd/DaIKgs2txsB1DjhlatWlJUfQuBwhtNUkpYFLFwKdCI1l813bPbNlL+GiL p0ZejVMpBC5HgE6sDOtaaQSAB+AEUp+Lgr+yaG/On8hfzwWFKO8KldxhiKY9n07v SNDfKDgXB+80hx4RBVGbkuogV3s9brFULoNRXJy7Uf79DtiY09uazhhA3G0TjO34 zGwmF91dqsXDF/Uz8g4aZO0xYRXUchOrsQ5lgO/GhTVbM9I0wWlMHEk/8WHyBJkU vlXOMuwzBc9/5uwZE3rnkA4a3nkXhPQjLlCr+/I7A/7Vsv9IBW9WSlgMvUN0Qf4S XAwTnIqfErnR60a+L0+HRr5kIV5VoXcxqI/Nv0/4/BMLRubS/c7cYjOTxXNJL9SU 50pv5vty9xk3HSpuz0JAOyLf+PUT773uUQhFr5xCBSCVqbAm5WFg6hWPAgrN/tUS wstBc0wlA73rKVJxeLDQwHc/oT1zTUEzswVZITQ5zLHK0t0GbeR6QHccsdeaJyTe DisX+66A6YQrEuJmx5xUZqjYHqtYLDOBTbHA3ZwQmvjKu8ibWZ8Fg9ioURLCS4bF +FVkp/5KdcAN9w== =ljVY -----END PGP SIGNATURE----- Merge tag 'rcu.2022.03.13a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: - Fix idle detection (Neeraj Upadhyay) and missing access marking detected by KCSAN. - Reduce coupling between rcu_barrier() and CPU-hotplug operations, so that rcu_barrier() no longer needs to do cpus_read_lock(). This may also someday allow system boot to bring CPUs online concurrently. - Enable more aggressive movement to per-CPU queueing when reacting to excessive lock contention due to workloads placing heavy update-side stress on RCU tasks. - Improvements to RCU priority boosting, including changes from Neeraj Upadhyay, Zqiang, and Alison Chaiken. - Various fixes improving test robustness and debug information. - Add tests for SRCU size transitions, further compress torture.sh build products, and improve debug output. - Miscellaneous fixes. * tag 'rcu.2022.03.13a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (49 commits) rcu: Replace cpumask_weight with cpumask_empty where appropriate rcu: Remove __read_mostly annotations from rcu_scheduler_active externs rcu: Uninline multi-use function: finish_rcuwait() rcu: Mark writes to the rcu_segcblist structure's ->flags field kasan: Record work creation stack trace with interrupts enabled rcu: Inline __call_rcu() into call_rcu() rcu: Add mutex for rcu boost kthread spawning and affinity setting rcu: Fix description of kvfree_rcu() MAINTAINERS: Add Frederic and Neeraj to their RCU files rcutorture: Provide non-power-of-two Tasks RCU scenarios rcutorture: Test SRCU size transitions torture: Make torture.sh help message match reality rcu-tasks: Set ->percpu_enqueue_shift to zero upon contention rcu-tasks: Use order_base_2() instead of ilog2() rcu: Create and use an rcu_rdp_cpu_online() rcu: Make rcu_barrier() no longer block CPU-hotplug operations rcu: Rework rcu_barrier() and callback-migration logic rcu: Refactor rcu_barrier() empty-list handling rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion torture: Change KVM environment variable to RCUTORTURE ...	2022-03-21 14:00:56 -07:00
Frederic Weisbecker	2984539959	tick/rcu: Remove obsolete rcu_needs_cpu() parameters With the removal of CONFIG_RCU_FAST_NO_HZ, the parameters in rcu_needs_cpu() are not necessary anymore. Simply remove them. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Paul Menzel <pmenzel@molgen.mpg.de>	2022-03-07 23:01:26 +01:00
Paul E. McKenney	d5578190be	Merge branches 'exp.2022.02.24a', 'fixes.2022.02.14a', 'rcu_barrier.2022.02.08a', 'rcu-tasks.2022.02.08a', 'rt.2022.02.01b', 'torture.2022.02.01b' and 'torturescript.2022.02.08a' into HEAD exp.2022.02.24a: Expedited grace-period updates. fixes.2022.02.14a: Miscellaneous fixes. rcu_barrier.2022.02.08a: Make rcu_barrier() no longer exclude CPU hotplug. rcu-tasks.2022.02.08a: RCU-tasks updates. rt.2022.02.01b: Real-time-related updates. torture.2022.02.01b: Torture-test updates. torturescript.2022.02.08a: Torture-test scripting updates.	2022-02-24 09:38:46 -08:00
Ingo Molnar	6255b48aeb	Linux 5.17-rc5 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmISrYgeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGg20IAKDZr7rfSHBopjQV Cocw744tom0XuxpvSZpp2GGOOXF+tkswcNNaRIrbGOl1mkyxA7eBZCTMpDeDS9aQ wB0D0Gxx8QBAJp4KgB1W7TB+hIGes/rs8Ve+6iO4ulLLdCVWX/q2boI0aZ7QX9O9 qNi8OsoZQtk6falRvciZFHwV5Av1p2Sy1AW57udQ7DvJ4H98AfKf1u8/z208WWW8 1ixC+qJxQcUcM9vI+7P9Tt7NbFSKv8SvAmqjFY7P+DxQAsVw6KXoqVXykDzeOv0t fUNOE/t0oFZafwtn8h7KBQnwS9lH03+3KkslVZs+iMFyUj/Bar+NVVyKoDhWXtVg /PuMhEg= =eU1o -----END PGP SIGNATURE----- Merge tag 'v5.17-rc5' into sched/core, to resolve conflicts New conflicts in sched/core due to the following upstream fixes: `44585f7bc0` ("psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n") `a06247c680` ("psi: Fix uaf issue when psi trigger is destroyed while being polled") Conflicts: include/linux/psi_types.h kernel/sched/psi.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2022-02-21 11:53:51 +01:00
Frederic Weisbecker	04d4e665a6	sched/isolation: Use single feature type while referring to housekeeping cpumask Refer to housekeeping APIs using single feature types instead of flags. This prevents from passing multiple isolation features at once to housekeeping interfaces, which soon won't be possible anymore as each isolation features will have their own cpumask. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Reviewed-by: Phil Auld <pauld@redhat.com> Link: https://lore.kernel.org/r/20220207155910.527133-5-frederic@kernel.org	2022-02-16 15:57:55 +01:00
Yury Norov	6a2c1d450a	rcu: Replace cpumask_weight with cpumask_empty where appropriate In some places, RCU code calls cpumask_weight() to check if any bit of a given cpumask is set. We can do it more efficiently with cpumask_empty() because cpumask_empty() stops traversing the cpumask as soon as it finds first set bit, while cpumask_weight() counts all bits unconditionally. Signed-off-by: Yury Norov <yury.norov@gmail.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-14 10:36:58 -08:00
Ingo Molnar	58d4292bd0	rcu: Uninline multi-use function: finish_rcuwait() This is a rarely used function, so uninlining its 3 instructions is probably a win or a wash - but the main motivation is to make <linux/rcuwait.h> independent of task_struct details. Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-14 10:36:58 -08:00
Paul E. McKenney	c099290310	rcu: Mark writes to the rcu_segcblist structure's ->flags field KCSAN reports data races between the rcu_segcblist_clear_flags() and rcu_segcblist_set_flags() functions, though misreporting the latter as a call to rcu_segcblist_is_enabled() from call_rcu(). This commit converts the updates of this field to WRITE_ONCE(), relying on the resulting unmarked reads to continue to detect buggy concurrent writes to this field. Reported-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org>	2022-02-14 10:36:58 -08:00
Zqiang	d818cc76e2	kasan: Record work creation stack trace with interrupts enabled Recording the work creation stack trace for KASAN reports in call_rcu() is expensive, due to unwinding the stack, but also due to acquiring depot_lock inside stackdepot (which may be contended). Because calling kasan_record_aux_stack_noalloc() does not require interrupts to already be disabled, this may unnecessarily extend the time with interrupts disabled. Therefore, move calling kasan_record_aux_stack() before the section with interrupts disabled. Acked-by: Marco Elver <elver@google.com> Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-14 10:36:58 -08:00
Paul E. McKenney	1fe09ebe7a	rcu: Inline __call_rcu() into call_rcu() Because __call_rcu() is invoked only by call_rcu(), this commit inlines the former into the latter. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-14 10:36:58 -08:00
David Woodhouse	218b957a69	rcu: Add mutex for rcu boost kthread spawning and affinity setting As we handle parallel CPU bringup, we will need to take care to avoid spawning multiple boost threads, or race conditions when setting their affinity. Spotted by Paul McKenney. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-14 10:36:35 -08:00
Paul E. McKenney	00a8b4b54c	rcu-tasks: Set ->percpu_enqueue_shift to zero upon contention Currently, call_rcu_tasks_generic() sets ->percpu_enqueue_shift to order_base_2(nr_cpu_ids) upon encountering sufficient contention. This does not shift to use of non-CPU-0 callback queues as intended, but rather continues using only CPU 0's queue. Although this does provide some decrease in contention due to spreading work over multiple locks, it is not the dramatic decrease that was intended. This commit therefore makes call_rcu_tasks_generic() set ->percpu_enqueue_shift to 0. Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-08 10:13:22 -08:00
Paul E. McKenney	2bcd18e041	rcu-tasks: Use order_base_2() instead of ilog2() The ilog2() function can be used to generate a shift count, but it will generate the same count for a power of two as for one greater than a power of two. This results in shift counts that are larger than necessary for systems with a power-of-two number of CPUs because the CPUs are numbered from zero, so that the maximum CPU number is one less than that power of two. This commit therefore substitutes order_base_2(), which appears to have been designed for exactly this use case. Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-08 10:13:12 -08:00
Paul E. McKenney	5ae0f1b58b	rcu: Create and use an rcu_rdp_cpu_online() The pattern "rdp->grpmask & rcu_rnp_online_cpus(rnp)" occurs frequently in RCU code in order to determine whether rdp->cpu is online from an RCU perspective. This commit therefore creates an rcu_rdp_cpu_online() function to replace it. [ paulmck: Apply kernel test robot unused-variable feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-08 10:12:28 -08:00
Paul E. McKenney	80b3fd474c	rcu: Make rcu_barrier() no longer block CPU-hotplug operations This commit removes the cpus_read_lock() and cpus_read_unlock() calls from rcu_barrier(), thus allowing CPUs to come and go during the course of rcu_barrier() execution. Posting of the ->barrier_head callbacks does synchronize with portions of RCU's CPU-hotplug notifiers, but these locks are held for short time periods on both sides. Thus, full CPU-hotplug operations could both start and finish during the execution of a given rcu_barrier() invocation. Additional synchronization is provided by a global ->barrier_lock. Since the ->barrier_lock is only used during rcu_barrier() execution and during onlining/offlining a CPU, the contention for this lock should be low. It might be tempting to make use of a per-CPU lock just on general principles, but straightforward attempts to do this have the problems shown below. Initial state: 3 CPUs present, CPU 0 and CPU1 do not have any callback and CPU2 has callbacks. 1. CPU0 calls rcu_barrier(). 2. CPU1 starts offlining for CPU2. CPU1 calls rcutree_migrate_callbacks(). rcu_barrier_entrain() is called from rcutree_migrate_callbacks(), with CPU2's rdp->barrier_lock. It does not entrain ->barrier_head for CPU2, as rcu_barrier() on CPU0 hasn't started the barrier sequence (by calling rcu_seq_start(&rcu_state.barrier_sequence)) yet. 3. CPU0 starts new barrier sequence. It iterates over CPU0 and CPU1, after acquiring their per-cpu ->barrier_lock and finds 0 segcblist length. It updates ->barrier_seq_snap for CPU0 and CPU1 and continues loop iteration to CPU2. for_each_possible_cpu(cpu) { raw_spin_lock_irqsave(&rdp->barrier_lock, flags); if (!rcu_segcblist_n_cbs(&rdp->cblist)) { WRITE_ONCE(rdp->barrier_seq_snap, gseq); raw_spin_unlock_irqrestore(&rdp->barrier_lock, flags); rcu_barrier_trace(TPS("NQ"), cpu, rcu_state.barrier_sequence); continue; } 4. rcutree_migrate_callbacks() completes execution on CPU1. Segcblist len for CPU2 becomes 0. 5. The loop iteration on CPU0, checks rcu_segcblist_n_cbs(&rdp->cblist) for CPU2 and completes the loop iteration after setting ->barrier_seq_snap. 6. As there isn't any ->barrier_head callback entrained; at this point, rcu_barrier() in CPU0 returns. 7. The callbacks, which migrated from CPU2 to CPU1, execute. Straightforward per-CPU locking is also subject to the following race condition noted by Boqun Feng: 1. CPU0 calls rcu_barrier(), starting a new barrier sequence by invoking rcu_seq_start() and init_completion(), but does not yet initialize rcu_state.barrier_cpu_count. 2. CPU1 starts offlining for CPU2, calling rcutree_migrate_callbacks(), which in turn calls rcu_barrier_entrain() holding CPU2's. rdp->barrier_lock. It then entrains ->barrier_head for CPU2 and atomically increments rcu_state.barrier_cpu_count, which is unfortunately not yet initialized to the value 2. 3. The just-entrained RCU callback is invoked. It atomically decrements rcu_state.barrier_cpu_count and sees that it is now zero. This callback therefore invokes complete(). 4. CPU0 continues executing rcu_barrier(), but is not blocked by its call to wait_for_completion(). This results in rcu_barrier() returning before all pre-existing callbacks have been invoked, which is a bug. Therefore, synchronization is provided by rcu_state.barrier_lock, which is also held across the initialization sequence, especially the rcu_seq_start() and the atomic_set() that sets rcu_state.barrier_cpu_count to the value 2. In addition, this lock is held when entraining the rcu_barrier() callback, when deciding whether or not a CPU has callbacks that rcu_barrier() must wait on, when setting the ->qsmaskinitnext for incoming CPUs, and when migrating callbacks from a CPU that is going offline. Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-08 10:12:28 -08:00
Paul E. McKenney	a16578dd5e	rcu: Rework rcu_barrier() and callback-migration logic This commit reworks rcu_barrier() and callback-migration logic to permit allowing rcu_barrier() to run concurrently with CPU-hotplug operations. The key trick is for callback migration to check to see if an rcu_barrier() is in flight, and, if so, enqueue the ->barrier_head callback on its behalf. This commit adds synchronization with RCU's CPU-hotplug notifiers. Taken together, this will permit a later commit to remove the cpus_read_lock() and cpus_read_unlock() calls from rcu_barrier(). [ paulmck: Updated per kbuild test robot feedback. ] [ paulmck: Updated per reviews session with Neeraj, Frederic, Uladzislau, and Boqun. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-08 10:12:28 -08:00
Paul E. McKenney	0cabb47af3	rcu: Refactor rcu_barrier() empty-list handling This commit saves a few lines by checking first for an empty callback list. If the callback list is empty, then that CPU is taken care of, regardless of its online or nocb state. Also simplify tracing accordingly and fold a few lines together. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-08 10:12:28 -08:00
David Woodhouse	82980b1622	rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion If we allow architectures to bring APs online in parallel, then we end up requiring rcu_cpu_starting() to be reentrant. But currently, the manipulation of rnp->ofl_seq is not thread-safe. However, rnp->ofl_seq is also fairly much pointless anyway since both rcu_cpu_starting() and rcu_report_dead() hold rcu_state.ofl_lock for fairly much the whole time that rnp->ofl_seq is set to an odd number to indicate that an operation is in progress. So drop rnp->ofl_seq completely, and use only rcu_state.ofl_lock. This has a couple of minor complexities: lockdep will complain when we take rcu_state.ofl_lock, and currently accepts the 'excuse' of having an odd value in rnp->ofl_seq. So switch it to an arch_spinlock_t to avoid that false positive complaint. Since we're killing rnp->ofl_seq of course that 'excuse' has to be changed too, so make it check for arch_spin_is_locked(rcu_state.ofl_lock). There's no arch_spin_lock_irqsave() so we have to manually save and restore local interrupts around the locking. At Paul's request based on Neeraj's analysis, make rcu_gp_init not just wait but exclude any CPU online/offline activity, which was fairly much true already by virtue of it holding rcu_state.ofl_lock. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-08 10:11:41 -08:00
Paul E. McKenney	9c0f1c7fd7	rcutorture: Enable limited callback-flooding tests of SRCU This commit allows up to 50,000 callbacks worth of callback-flooding tests of SRCU. The goal of this change is to exercise Tree SRCU's ability to transition from SRCU_SIZE_SMALL to SRCU_SIZE_BIG triggered by callback-queue-time lock contention. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:24:39 -08:00
Paul E. McKenney	89440d2dad	rcutorture: Fix rcu_fwd_mutex deadlock The rcu_torture_fwd_cb_hist() function acquires rcu_fwd_mutex, but is invoked from rcutorture_oom_notify() function, which hold this same mutex across this call. This commit fixes the resulting deadlock. Reported-by: kernel test robot <oliver.sang@intel.com> Tested-by: Oliver Sang <oliver.sang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:24:39 -08:00
Paul E. McKenney	02b51a1cf4	rcutorture: Add end-of-test check to rcu_torture_fwd_prog() loop The second and subsequent forward-progress kthreads loop waiting for the first forward-progress kthread to start the next test interval. Unfortunately, if the test ends while one of those kthreads is waiting, the test will hang. This hang occurs because that wait loop fails to check for the end of the test. This commit therefore adds an end-of-test check to that wait loop. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:24:38 -08:00
Paul E. McKenney	e22ef8df41	rcutorture: Make rcu_fwd_cb_nodelay be a counter Back when only one rcutorture kthread could do forward-progress testing, it was just fine for rcu_fwd_cb_nodelay to be a non-atomic bool. It was set at the start of forward-progress testing and cleared at the end. But now that there are multiple threads, the value can be cleared while one of the threads is still doing forward-progress testing. This commit therefore makes rcu_fwd_cb_nodelay be an atomic counter, replacing the WRITE_ONCE() operations with atomic_inc() and atomic_dec(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:24:38 -08:00
Paul E. McKenney	05b724655b	rcutorture: Increase visibility of forward-progress hangs This commit adds a few pr_alert() calls to rcutorture's forward-progress testing in order to better diagnose shutdown-time hangs. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:24:38 -08:00
Paul E. McKenney	6f81bd6a4e	rcutorture: Print message before invoking ->cb_barrier() The various ->cb_barrier() functions, for example, rcu_barrier(), sometimes cause rcutorture hangs. But currently, the last console message is the unenlightening "Stopping rcu_torture_stats". This commit therefore prints a message of the form "rcu_torture_cleanup: Invoking rcu_barrier+0x0/0x1e0()" to help point people in the right direction. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:24:38 -08:00
Zqiang	c951587585	rcu: Add per-CPU rcuc task dumps to RCU CPU stall warnings When the rcutree.use_softirq kernel boot parameter is set to zero, all RCU_SOFTIRQ processing is carried out by the per-CPU rcuc kthreads. If these kthreads are being starved, quiescent states will not be reported, which in turn means that the grace period will not end, which can in turn trigger RCU CPU stall warnings. This commit therefore dumps stack traces of stalled CPUs' rcuc kthreads, which can help identify what is preventing those kthreads from running. Suggested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Reviewed-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:22:17 -08:00
Paul E. McKenney	10c5357874	rcu: Don't deboost before reporting expedited quiescent state Currently rcu_preempt_deferred_qs_irqrestore() releases rnp->boost_mtx before reporting the expedited quiescent state. Under heavy real-time load, this can result in this function being preempted before the quiescent state is reported, which can in turn prevent the expedited grace period from completing. Tim Murray reports that the resulting expedited grace periods can take hundreds of milliseconds and even more than one second, when they should normally complete in less than a millisecond. This was fine given that there were no particular response-time constraints for synchronize_rcu_expedited(), as it was designed for throughput rather than latency. However, some users now need sub-100-millisecond response-time constratints. This patch therefore follows Neeraj's suggestion (seconded by Tim and by Uladzislau Rezki) of simply reversing the two operations. Reported-by: Tim Murray <timmurray@google.com> Reported-by: Joel Fernandes <joelaf@google.com> Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Tested-by: Tim Murray <timmurray@google.com> Cc: Todd Kjos <tkjos@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: <stable@vger.kernel.org> # 5.4.x Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:19:41 -08:00
Alison Chaiken	c8b16a6526	rcu: Elevate priority of offloaded callback threads When CONFIG_PREEMPT_RT=y, the rcutree.kthread_prio command-line parameter signals initialization code to boost the priority of rcuc callbacks to the designated value. With the additional CONFIG_RCU_NOCB_CPU=y configuration and an additional rcu_nocbs command-line parameter, the callbacks on the listed cores are offloaded to new rcuop kthreads that are not pinned to the cores whose post-grace-period work is performed. While the rcuop kthreads perform the same function as the rcuc kthreads they offload, the kthread_prio parameter only boosts the priority of the rcuc kthreads. Fix this inconsistency by elevating rcuop kthreads to the same priority as the rcuc kthreads. Signed-off-by: Alison Chaiken <achaiken@aurora.tech> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:19:25 -08:00
Alison Chaiken	54577e23fa	rcu: Make priority of grace-period thread consistent The priority of RCU grace period threads is set to kthread_prio when they are launched from rcu_spawn_gp_kthread(). The same is not true of rcu_spawn_one_nocb_kthread(). Accordingly, add priority elevation to rcu_spawn_one_nocb_kthread(). Signed-off-by: Alison Chaiken <achaiken@aurora.tech> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:19:17 -08:00
Alison Chaiken	c8db27dd0e	rcu: Move kthread_prio bounds-check to a separate function Move the bounds-check of the kthread_prio cmdline parameter to a new function in order to faciliate a different callsite. Signed-off-by: Alison Chaiken <achaiken@aurora.tech> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:19:02 -08:00
Zqiang	4b4399b245	rcu: Create per-cpu rcuc kthreads only when rcutree.use_softirq=0 The per-CPU "rcuc" kthreads are used only by kernels booted with rcutree.use_softirq=0, but they are nevertheless unconditionally created by kernels built with CONFIG_RCU_BOOST=y. This results in "rcuc" kthreads being created that are never actually used. This commit therefore refrains from creating these kthreads unless the kernel is actually booted with rcutree.use_softirq=0. Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:19:02 -08:00
Neeraj Upadhyay	eae9f147a4	rcu: Remove unused rcu_state.boost Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:19:02 -08:00
Neeraj Upadhyay	02e3024175	rcu/nocb: Handle concurrent nocb kthreads creation When multiple CPUs in the same nocb gp/cb group concurrently come online, they might try to concurrently create the same rcuog kthread. Fix this by using nocb gp CPU's spawn mutex to provide mutual exclusion for the rcuog kthread creation code. [ paulmck: Whitespace fixes per kernel test robot feedback. ] Acked-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:19:02 -08:00
Paul E. McKenney	a47f9f131d	rcu: Mark accesses to boost_starttime The boost_starttime shared variable has conflicting unmarked C-language accesses, which are dangerous at best. This commit therefore adds appropriate marking. This was found by KCSAN. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:19:02 -08:00
Paul E. McKenney	63c564da11	rcu: Mark ->expmask access in synchronize_rcu_expedited_wait() This commit adds a READ_ONCE() to an access to the rcu_node structure's ->expmask field to prevent compiler mischief. Detected by KCSAN. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:05:10 -08:00
Neeraj Upadhyay	4d266c247d	rcu/exp: Fix check for idle context in rcu_exp_handler For PREEMPT_RCU, the rcu_exp_handler() function checks whether the current CPU is in idle, by calling rcu_dynticks_curr_cpu_in_eqs(). However, rcu_exp_handler() is called in IPI handler context. So, it should be checking the idle context using rcu_is_cpu_rrupt_from_idle(). Fix this by using rcu_is_cpu_rrupt_from_idle() instead of rcu_dynticks_curr_cpu_in_eqs(). Non-preempt configuration already uses the correct check. Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:05:10 -08:00
Paul E. McKenney	da123016ca	rcu-tasks: Fix computation of CPU-to-list shift counts The ->percpu_enqueue_shift field is used to map from the running CPU number to the index of the corresponding callback list. This mapping can change at runtime in response to varying callback load, resulting in varying levels of contention on the callback-list locks. Unfortunately, the initial value of this field is correct only if the system happens to have a power-of-two number of CPUs, otherwise the callbacks from the high-numbered CPUs can be placed into the callback list indexed by 1 (rather than 0), and those index-1 callbacks will be ignored. This can result in soft lockups and hangs. This commit therefore corrects this mapping, adding one to this shift count as needed for systems having odd numbers of CPUs. Fixes: `7a30871b6a` ("rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue selection") Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-01-26 13:04:05 -08:00
Linus Torvalds	f56caedaf9	Merge branch 'akpm' (patches from Andrew) Merge misc updates from Andrew Morton: "146 patches. Subsystems affected by this patch series: kthread, ia64, scripts, ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak, dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap, memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb, userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp, ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and damon)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits) mm/damon: hide kernel pointer from tracepoint event mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging mm/damon/dbgfs: remove an unnecessary variable mm/damon: move the implementation of damon_insert_region to damon.h mm/damon: add access checking for hugetlb pages Docs/admin-guide/mm/damon/usage: update for schemes statistics mm/damon/dbgfs: support all DAMOS stats Docs/admin-guide/mm/damon/reclaim: document statistics parameters mm/damon/reclaim: provide reclamation statistics mm/damon/schemes: account how many times quota limit has exceeded mm/damon/schemes: account scheme actions that successfully applied mm/damon: remove a mistakenly added comment for a future feature Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk\|rm)_contexts Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning Docs/admin-guide/mm/damon/usage: remove redundant information Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks mm/damon: convert macro functions to static inline functions mm/damon: modify damon_rand() macro to static inline function mm/damon: move damon_rand() definition into damon.h ...	2022-01-15 20:37:06 +02:00
Cai Huoqing	3b9cb4ba4b	rcutorture: make use of the helper function kthread_run_on_cpu() Replace kthread_create_on_node/kthread_bind/wake_up_process() with kthread_run_on_cpu() to simplify the code. Link: https://lkml.kernel.org/r/20211022025711.3673-5-caihuoqing@baidu.com Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Cc: Bernard Metzler <bmt@zurich.ibm.com> Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Doug Ledford <dledford@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-15 16:30:24 +02:00
Paul E. McKenney	f80fe66c38	Merge branches 'doc.2021.11.30c', 'exp.2021.12.07a', 'fastnohz.2021.11.30c', 'fixes.2021.11.30c', 'nocb.2021.12.09a', 'nolibc.2021.11.30c', 'tasks.2021.12.09a', 'torture.2021.12.07a' and 'torturescript.2021.11.30c' into HEAD doc.2021.11.30c: Documentation updates. exp.2021.12.07a: Expedited-grace-period fixes. fastnohz.2021.11.30c: Remove CONFIG_RCU_FAST_NO_HZ. fixes.2021.11.30c: Miscellaneous fixes. nocb.2021.12.09a: No-CB CPU updates. nolibc.2021.11.30c: Tiny in-kernel library updates. tasks.2021.12.09a: RCU-tasks updates, including update-side scalability. torture.2021.12.07a: Torture-test in-kernel module updates. torturescript.2021.11.30c: Torture-test scripting updates.	2021-12-09 11:38:09 -08:00
Frederic Weisbecker	10d4703154	rcu/nocb: Merge rcu_spawn_cpu_nocb_kthread() and rcu_spawn_one_nocb_kthread() The rcu_spawn_one_nocb_kthread() function is called only from rcu_spawn_cpu_nocb_kthread(). Therefore, inline the former into the latter, saving a few lines of code. Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Tested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-12-09 11:35:16 -08:00
Frederic Weisbecker	d2cf0854d7	rcu/nocb: Allow empty "rcu_nocbs" kernel parameter Allow the rcu_nocbs kernel parameter to be specified just by itself, without specifying any CPUs. This allows systems administrators to use "rcu_nocbs" to specify that none of the CPUs are to be offloaded at boot time, but than any of them may be offloaded at runtime via cpusets. In contrast, if the "rcu_nocbs" or "nohz_full" kernel parameters are not specified at all, then not only are none of the CPUs offloaded at boot, none of them can be offloaded at runtime, either. While in the area, modernize the description of the "rcuo" kthreads' naming scheme. Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Tested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-12-09 11:35:11 -08:00
Frederic Weisbecker	2cf4528d6d	rcu/nocb: Create kthreads on all CPUs if "rcu_nocbs=" or "nohz_full=" are passed In order to be able to (de-)offload any CPU using cpusets in the future, create the NOCB data structures for all possible CPUs. For now this is done only as long as the "rcu_nocbs=" or "nohz_full=" kernel parameters are passed to avoid the unnecessary overhead for most users. Note that the rcuog and rcuoc kthreads are not created until at least one of the corresponding CPUs comes online. This approach avoids the creation of excess kthreads when firmware lies about the number of CPUs present on the system. Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Tested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-12-09 11:35:06 -08:00
Frederic Weisbecker	a81aeaf7a1	rcu/nocb: Optimize kthreads and rdp initialization Currently cpumask_available() is used to prevent from unwanted NOCB initialization. However if neither "rcu_nocbs=" nor "nohz_full=" parameters are passed to a kernel built with CONFIG_CPUMASK_OFFSTACK=n, the initialization path is still taken, running through all sorts of needless operations and iterations on an empty cpumask. Fix this by relying on a real initialization state instead. This also optimizes kthread creation, preventing needless iteration over all online CPUs when the kernel is booted without any offloaded CPUs. Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Tested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-12-09 11:34:37 -08:00
Frederic Weisbecker	8d97039646	rcu/nocb: Prepare nocb_cb_wait() to start with a non-offloaded rdp In order to be able to toggle the offloaded state from cpusets, a nocb kthread will need to be created for all possible CPUs whenever either of the "rcu_nocbs=" or "nohz_full=" parameters are specified. Therefore, the nocb_cb_wait() kthread must be prepared to start running on a de-offloaded rdp. To accomplish this, simply move the sleeping condition to the beginning of the nocb_cb_wait() function, which prevents this kthread from attempting to invoke callbacks before the corresponding CPU is offloaded. Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Tested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-12-09 11:34:30 -08:00
Frederic Weisbecker	2ebc45c44c	rcu/nocb: Remove rcu_node structure from nocb list when de-offloaded The nocb_gp_wait() function iterates over all CPUs in its group, including even those CPUs that have been de-offloaded. This is of course suboptimal, especially if none of the CPUs within the group are currently offloaded. This will become even more of a problem once a nocb kthread is created for all possible CPUs. Therefore use a standard double linked list to link all the offloaded rcu_data structures and safely add or delete these structure as we offload or de-offload them, respectively. Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Tested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-12-09 11:34:07 -08:00
Paul E. McKenney	fd796e4139	rcu-tasks: Use fewer callbacks queues if callback flood ends By default, when lock contention is encountered, the RCU Tasks flavors of RCU switch to using per-CPU queueing. However, if the callback flood ends, per-CPU queueing continues to be used, which introduces significant additional overhead, especially for callback invocation, which fans out a series of workqueue handlers. This commit therefore switches back to single-queue operation if at the beginning of a grace period there are very few callbacks. The definition of "very few" is set by the rcupdate.rcu_task_collapse_lim module parameter, which defaults to 10. This switch happens in two phases, with the first phase causing future callbacks to be enqueued on CPU 0's queue, but with all queues continuing to be checked for grace periods and callback invocation. The second phase checks to see if an RCU grace period has elapsed and if all remaining RCU-Tasks callbacks are queued on CPU 0. If so, only CPU 0 is checked for future grace periods and callback operation. Of course, the return of contention anywhere during this process will result in returning to per-CPU callback queueing. Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-12-09 10:52:11 -08:00
Paul E. McKenney	2cee0789b4	rcu-tasks: Use separate ->percpu_dequeue_lim for callback dequeueing Decreasing the number of callback queues is a bit tricky because it is necessary to handle callbacks that were queued before the number of queues decreased, but which were not ready to invoke until afterwards. This commit takes a first step in this direction by maintaining a separate ->percpu_dequeue_lim to control callback dequeueing, in addition to the existing ->percpu_enqueue_lim which now controls only enqueueing. Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-12-09 10:52:11 -08:00
Paul E. McKenney	ab97152f88	rcu-tasks: Use more callback queues if contention encountered The rcupdate.rcu_task_enqueue_lim module parameter allows system administrators to tune the number of callback queues used by the RCU Tasks flavors. However if callback storms are infrequent, it would be better to operate with a single queue on a given system unless and until that system actually needed more queues. Systems not needing more queues can then avoid the overhead of checking the extra queues and especially avoid the overhead of fanning workqueue handlers out to all CPUs to invoke callbacks. This commit therefore switches to using all the CPUs' callback queues if call_rcu_tasks_generic() encounters too much lock contention. The amount of lock contention to tolerate defaults to 100 contended lock acquisitions per jiffy, and can be adjusted using the new rcupdate.rcu_task_contend_lim module parameter. Such switching is undertaken only if the rcupdate.rcu_task_enqueue_lim module parameter is negative, which is its default value (-1). This allows savvy systems administrators to set the number of queues to some known good value and to not have to worry about the kernel doing any second guessing. [ paulmck: Apply feedback from Guillaume Tucker and kernelci. ] Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-12-09 10:52:11 -08:00
Paul E. McKenney	3063b33a34	rcu-tasks: Avoid raw-spinlocked wakeups from call_rcu_tasks_generic() If the caller of of call_rcu_tasks(), call_rcu_tasks_rude(), or call_rcu_tasks_trace() holds a raw spinlock, and then if call_rcu_tasks_generic() determines that the grace-period kthread must be awakened, then the wakeup might acquire a normal spinlock while a raw spinlock is held. This results in lockdep splats when the kernel is built with CONFIG_PROVE_RAW_LOCK_NESTING=y. This commit therefore defers the wakeup using irq_work_queue(). It would be nice to directly invoke wakeup when a raw spinlock is not held, but there is currently no way to check for this in all kernels. Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2021-12-09 10:52:11 -08:00

1 2 3 4 5 ...

1998 Commits