linux-sg2042/kernel
Waiman Long 617f3ef951 locking/rwsem: Remove reader optimistic spinning
Reader optimistic spinning is helpful when the reader critical section
is short and there aren't that many readers around. It also improves
the chance that a reader can get the lock as writer optimistic spinning
disproportionally favors writers much more than readers.

Since commit d3681e269f ("locking/rwsem: Wake up almost all readers
in wait queue"), all the waiting readers are woken up so that they can
all get the read lock and run in parallel. When the number of contending
readers is large, allowing reader optimistic spinning will likely cause
reader fragmentation where multiple smaller groups of readers can get
the read lock in a sequential manner separated by writers. That reduces
reader parallelism.

One possible way to address that drawback is to limit the number of
readers (preferably one) that can do optimistic spinning. These readers
act as representatives of all the waiting readers in the wait queue as
they will wake up all those waiting readers once they get the lock.

Alternatively, as reader optimistic lock stealing has already enhanced
fairness to readers, it may be easier to just remove reader optimistic
spinning and simplifying the optimistic spinning code as a result.

Performance measurements (locking throughput kops/s) using a locking
microbenchmark with 50/50 reader/writer distribution and turbo-boost
disabled was done on a 2-socket Cascade Lake system (48-core 96-thread)
to see the impacts of these changes:

  1) Vanilla     - 5.10-rc3 kernel
  2) Before      - 5.10-rc3 kernel with previous patches in this series
  2) limit-rspin - 5.10-rc3 kernel with limited reader spinning patch
  3) no-rspin    - 5.10-rc3 kernel with reader spinning disabled

  # of threads  CS Load   Vanilla  Before   limit-rspin   no-rspin
  ------------  -------   -------  ------   -----------   --------
       2            1      5,185    5,662      5,214       5,077
       4            1      5,107    4,983      5,188       4,760
       8            1      4,782    4,564      4,720       4,628
      16            1      4,680    4,053      4,567       3,402
      32            1      4,299    1,115      1,118       1,098
      64            1      3,218      983      1,001         957
      96            1      1,938      944        957         930

       2           20      2,008    2,128      2,264       1,665
       4           20      1,390    1,033      1,046       1,101
       8           20      1,472    1,155      1,098       1,213
      16           20      1,332    1,077      1,089       1,122
      32           20        967      914        917         980
      64           20        787      874        891         858
      96           20        730      836        847         844

       2          100        372      356        360         355
       4          100        492      425        434         392
       8          100        533      537        529         538
      16          100        548      572        568         598
      32          100        499      520        527         537
      64          100        466      517        526         512
      96          100        406      497        506         509

The column "CS Load" represents the number of pause instructions issued
in the locking critical section. A CS load of 1 is extremely short and
is not likey in real situations. A load of 20 (moderate) and 100 (long)
are more realistic.

It can be seen that the previous patches in this series have reduced
performance in general except in highly contended cases with moderate
or long critical sections that performance improves a bit. This change
is mostly caused by the "Prevent potential lock starvation" patch that
reduce reader optimistic spinning and hence reduce reader fragmentation.

The patch that further limit reader optimistic spinning doesn't seem to
have too much impact on overall performance as shown in the benchmark
data.

The patch that disables reader optimistic spinning shows reduced
performance at lightly loaded cases, but comparable or slightly better
performance on with heavier contention.

This patch just removes reader optimistic spinning for now. As readers
are not going to do optimistic spinning anymore, we don't need to
consider if the OSQ is empty or not when doing lock stealing.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Link: https://lkml.kernel.org/r/20201121041416.12285-6-longman@redhat.com
2020-12-09 17:08:48 +01:00
..
bpf bpf: Relax return code check for subprograms 2020-11-14 08:17:27 -08:00
cgroup kernel/: fix repeated words in comments 2020-10-16 11:11:19 -07:00
configs
debug kdb: Fix pager search for multi-line strings 2020-10-01 14:44:08 +01:00
dma swiotlb: remove the tbl_dma_addr argument to swiotlb_tbl_map_single 2020-11-02 10:10:39 -05:00
entry entry: Fix the incorrect ordering of lockdep and RCU check 2020-11-04 18:06:14 +01:00
events perf: Tweak perf_event_attr::exclusive semantics 2020-11-09 18:12:36 +01:00
gcov gcov: add support for GCC 10.1 2020-09-11 09:33:54 -07:00
irq A set of fixes for interrupt chip drivers: 2020-11-08 09:52:57 -08:00
kcsan kernel/: fix repeated words in comments 2020-10-16 11:11:19 -07:00
livepatch kernel/: fix repeated words in comments 2020-10-16 11:11:19 -07:00
locking locking/rwsem: Remove reader optimistic spinning 2020-12-09 17:08:48 +01:00
power PM: sleep: fix typo in kernel/power/process.c 2020-10-27 19:11:44 +01:00
printk Urgent printk fix for 5.10 2020-11-27 10:38:36 -08:00
rcu Merge branch 'urgent-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu 2020-11-17 10:31:56 -08:00
sched Yet two more places which invoke tracing from RCU disabled regions in the 2020-11-29 11:19:26 -08:00
time time: Prevent undefined behaviour in timespec64_to_ns() 2020-10-26 11:48:11 +01:00
trace Tracing fixes for 5.10-rc6 2020-12-01 15:30:18 -08:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
Makefile Kbuild updates for v5.10 2020-10-22 13:13:57 -07:00
acct.c kernel: acct.c: fix some kernel-doc nits 2020-10-16 11:11:19 -07:00
async.c treewide: Remove uninitialized_var() usage 2020-07-16 12:35:15 -07:00
audit.c audit: Remove redundant null check 2020-08-26 09:10:39 -04:00
audit.h audit: change unnecessary globals into statics 2020-08-17 20:26:58 -04:00
audit_fsnotify.c fsnotify: create method handle_inode_event() in fsnotify_operations 2020-07-27 23:25:50 +02:00
audit_tree.c \n 2020-08-06 19:29:51 -07:00
audit_watch.c fsnotify: create method handle_inode_event() in fsnotify_operations 2020-07-27 23:25:50 +02:00
auditfilter.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
auditsc.c audit/stable-5.9 PR 20200803 2020-08-04 14:20:26 -07:00
backtracetest.c treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD() 2020-07-30 11:15:58 -07:00
bounds.c
capability.c LSM: Signal to SafeSetID when setting group IDs 2020-10-13 09:17:34 -07:00
compat.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
configs.c
context_tracking.c context_tracking: Ensure that the critical path cannot be instrumented 2020-06-11 15:14:36 +02:00
cpu.c The changes in this cycle are: 2020-06-03 13:06:42 -07:00
cpu_pm.c notifier: Fix broken error handling pattern 2020-09-01 09:58:03 +02:00
crash_core.c kdump: append kernel build-id string to VMCOREINFO 2020-08-12 10:58:01 -07:00
crash_dump.c
cred.c exec: Teach prepare_exec_creds how exec treats uids & gids 2020-05-20 14:44:21 -05:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c don't dump the threads that had been already exiting when zapped. 2020-10-28 16:39:49 -04:00
extable.c
fail_function.c fail_function: Remove a redundant mutex unlock 2020-11-19 11:58:16 -08:00
fork.c fork: fix copy_process(CLONE_PARENT) race with the exiting ->real_parent 2020-11-08 11:18:39 -08:00
freezer.c
futex.c Merge branch 'locking/rwsem' 2020-12-09 17:08:45 +01:00
gen_kheaders.sh kbuild: add variables for compression tools 2020-06-06 23:42:01 +09:00
groups.c LSM: Signal to SafeSetID when setting group IDs 2020-10-13 09:17:34 -07:00
hung_task.c kernel/hung_task.c: make type annotations consistent 2020-11-02 12:14:19 -08:00
iomem.c
irq_work.c irq_work, smp: Allow irq_work on call_single_queue 2020-05-28 10:54:15 +02:00
jump_label.c kernel/: fix repeated words in comments 2020-10-16 11:11:19 -07:00
kallsyms.c treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
kcmp.c
kcov.c kcov: make some symbols static 2020-08-12 10:58:02 -07:00
kexec.c LSM: Introduce kernel_post_load_data() hook 2020-10-05 13:37:03 +02:00
kexec_core.c kernel/: fix repeated words in comments 2020-10-16 11:11:19 -07:00
kexec_elf.c
kexec_file.c kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGED 2020-10-16 11:11:18 -07:00
kexec_internal.h
kheaders.c
kmod.c kmod: remove redundant "be an" in the comment 2020-08-12 10:58:01 -07:00
kprobes.c kprobes: Tell lockdep about kprobe nesting 2020-11-04 09:46:06 -05:00
ksysfs.c
kthread.c kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled 2020-11-02 12:14:19 -08:00
latencytop.c sysctl: pass kernel pointers to ->proc_handler 2020-04-27 02:07:40 -04:00
module-internal.h
module.c Modules updates for v5.10 2020-10-22 13:08:57 -07:00
module_signature.c
module_signing.c
notifier.c notifier: Fix broken error handling pattern 2020-09-01 09:58:03 +02:00
nsproxy.c nsproxy: support CLONE_NEWTIME with setns() 2020-07-08 11:14:22 +02:00
padata.c padata: fix possible padata_works_lock deadlock 2020-09-04 17:51:55 +10:00
panic.c panic: don't dump stack twice on warn 2020-11-14 11:26:04 -08:00
params.c params: Replace zero-length array with flexible-array member 2020-10-29 17:22:59 -05:00
pid.c pid: move pidfd_get_pid() to pid.c 2020-10-18 09:27:10 -07:00
pid_namespace.c kernel/: fix repeated words in comments 2020-10-16 11:11:19 -07:00
profile.c
ptrace.c ptrace: Set PF_SUPERPRIV when checking capability 2020-11-17 12:53:22 -08:00
range.c kernel.h: split out min()/max() et al. helpers 2020-10-16 11:11:19 -07:00
reboot.c reboot: fix overflow parsing reboot cpu number 2020-11-14 11:26:03 -08:00
regset.c regset: kill ->get() 2020-07-27 14:31:12 -04:00
relay.c kernel/relay.c: drop unneeded initialization 2020-10-16 11:11:22 -07:00
resource.c kernel/resource: make iomem_resource implicit in release_mem_region_adjustable() 2020-10-16 11:11:18 -07:00
rseq.c
scftorture.c scftorture: Add cond_resched() to test loop 2020-08-24 18:38:38 -07:00
scs.c mm: memcontrol: account kernel stack per node 2020-08-07 11:33:25 -07:00
seccomp.c seccomp: Set PF_SUPERPRIV when checking capability 2020-11-17 12:53:22 -08:00
signal.c ptrace: fix task_join_group_stop() for the case when current is traced 2020-11-02 12:14:19 -08:00
smp.c Merge tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-10-18 14:34:50 -07:00
smpboot.c
smpboot.h
softirq.c softirq: Add debug check to __raise_softirq_irqoff() 2020-09-16 15:18:56 +02:00
stackleak.c stackleak: let stack_erasing_sysctl take a kernel pointer buffer 2020-09-19 13:13:39 -07:00
stacktrace.c stacktrace: Remove reliable argument from arch_stack_walk() callback 2020-09-18 14:24:16 +01:00
static_call.c static_call: Fix return type of static_call_init 2020-10-02 21:18:25 +02:00
stop_machine.c stop_machine, rcu: Mark functions as notrace 2020-10-26 12:12:27 +01:00
sys.c kernel/sys.c: fix prototype of prctl_get_tid_address() 2020-10-25 11:44:16 -07:00
sys_ni.c mm/madvise: introduce process_madvise() syscall: an external memory hinting API 2020-10-18 09:27:10 -07:00
sysctl-test.c
sysctl.c mm: allow a controlled amount of unfairness in the page lock 2020-09-17 10:26:41 -07:00
task_work.c task_work: cleanup notification modes 2020-10-17 15:05:30 -06:00
taskstats.c taskstats: move specifying netlink policy back to ops 2020-10-02 19:11:12 -07:00
test_kprobes.c
torture.c torture: Dump ftrace at shutdown only if requested 2020-06-29 12:01:45 -07:00
tracepoint.c tracepoint: Replace zero-length array with flexible-array member 2020-10-29 17:22:59 -05:00
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c usermodehelper: reset umask to default before executing user process 2020-10-06 10:31:52 -07:00
up.c
user-return-notifier.c
user.c user.c: make uidhash_table static 2020-06-04 19:06:24 -07:00
user_namespace.c kernel/: fix repeated words in comments 2020-10-16 11:11:19 -07:00
usermode_driver.c umd: Stop using split_argv 2020-07-07 11:58:59 -05:00
utsname.c nsproxy: add struct nsset 2020-05-09 13:57:12 +02:00
utsname_sysctl.c sysctl: pass kernel pointers to ->proc_handler 2020-04-27 02:07:40 -04:00
watch_queue.c watch_queue: Limit the number of watches a user can hold 2020-08-17 09:39:18 -07:00
watchdog.c kernel/watchdog: fix watchdog_allowed_mask not used warning 2020-11-14 11:26:03 -08:00
watchdog_hld.c
workqueue.c workqueue: fix a kernel-doc warning 2020-10-16 07:28:20 +02:00
workqueue_internal.h