OpenCloudOS-Kernel/kernel
Barry Song 1c10a174c1 sched: Add cluster scheduler level in core and related Kconfig for ARM64
mainline inclusion
from mainline-v5.16-rc1
commit 778c558f49 upstream.

------------------------------------------------------------------------

This patch adds scheduler level for clusters and automatically enables
the load balance among clusters. It will directly benefit a lot of
workload which loves more resources such as memory bandwidth, caches.

Testing has widely been done in two different hardware configurations of
Kunpeng920:

 24 cores in one NUMA(6 clusters in each NUMA node);
 32 cores in one NUMA(8 clusters in each NUMA node)

Workload is running on either one NUMA node or four NUMA nodes, thus,
this can estimate the effect of cluster spreading w/ and w/o NUMA load
balance.

* Stream benchmark:

4threads stream (on 1NUMA * 24cores = 24cores)
                stream                 stream
                w/o patch              w/ patch
MB/sec copy     29929.64 (   0.00%)    32932.68 (  10.03%)
MB/sec scale    29861.10 (   0.00%)    32710.58 (   9.54%)
MB/sec add      27034.42 (   0.00%)    32400.68 (  19.85%)
MB/sec triad    27225.26 (   0.00%)    31965.36 (  17.41%)

6threads stream (on 1NUMA * 24cores = 24cores)
                stream                 stream
                w/o patch              w/ patch
MB/sec copy     40330.24 (   0.00%)    42377.68 (   5.08%)
MB/sec scale    40196.42 (   0.00%)    42197.90 (   4.98%)
MB/sec add      37427.00 (   0.00%)    41960.78 (  12.11%)
MB/sec triad    37841.36 (   0.00%)    42513.64 (  12.35%)

12threads stream (on 1NUMA * 24cores = 24cores)
                stream                 stream
                w/o patch              w/ patch
MB/sec copy     52639.82 (   0.00%)    53818.04 (   2.24%)
MB/sec scale    52350.30 (   0.00%)    53253.38 (   1.73%)
MB/sec add      53607.68 (   0.00%)    55198.82 (   2.97%)
MB/sec triad    54776.66 (   0.00%)    56360.40 (   2.89%)

Thus, it could help memory-bound workload especially under medium load.
Similar improvement is also seen in lkp-pbzip2:

* lkp-pbzip2 benchmark

2-96 threads (on 4NUMA * 24cores = 96cores)
                  lkp-pbzip2              lkp-pbzip2
                  w/o patch               w/ patch
Hmean     tput-2   11062841.57 (   0.00%)  11341817.51 *   2.52%*
Hmean     tput-5   26815503.70 (   0.00%)  27412872.65 *   2.23%*
Hmean     tput-8   41873782.21 (   0.00%)  43326212.92 *   3.47%*
Hmean     tput-12  61875980.48 (   0.00%)  64578337.51 *   4.37%*
Hmean     tput-21 105814963.07 (   0.00%) 111381851.01 *   5.26%*
Hmean     tput-30 150349470.98 (   0.00%) 156507070.73 *   4.10%*
Hmean     tput-48 237195937.69 (   0.00%) 242353597.17 *   2.17%*
Hmean     tput-79 360252509.37 (   0.00%) 362635169.23 *   0.66%*
Hmean     tput-96 394571737.90 (   0.00%) 400952978.48 *   1.62%*

2-24 threads (on 1NUMA * 24cores = 24cores)
                 lkp-pbzip2               lkp-pbzip2
                 w/o patch                w/ patch
Hmean     tput-2   11071705.49 (   0.00%)  11296869.10 *   2.03%*
Hmean     tput-4   20782165.19 (   0.00%)  21949232.15 *   5.62%*
Hmean     tput-6   30489565.14 (   0.00%)  33023026.96 *   8.31%*
Hmean     tput-8   40376495.80 (   0.00%)  42779286.27 *   5.95%*
Hmean     tput-12  61264033.85 (   0.00%)  62995632.78 *   2.83%*
Hmean     tput-18  86697139.39 (   0.00%)  86461545.74 (  -0.27%)
Hmean     tput-24 104854637.04 (   0.00%) 104522649.46 *  -0.32%*

In the case of 6 threads and 8 threads, we see the greatest performance
improvement.

Similar improvement can be seen on lkp-pixz though the improvement is
smaller:

* lkp-pixz benchmark

2-24 threads lkp-pixz (on 1NUMA * 24cores = 24cores)
                  lkp-pixz               lkp-pixz
                  w/o patch              w/ patch
Hmean     tput-2   6486981.16 (   0.00%)  6561515.98 *   1.15%*
Hmean     tput-4  11645766.38 (   0.00%) 11614628.43 (  -0.27%)
Hmean     tput-6  15429943.96 (   0.00%) 15957350.76 *   3.42%*
Hmean     tput-8  19974087.63 (   0.00%) 20413746.98 *   2.20%*
Hmean     tput-12 28172068.18 (   0.00%) 28751997.06 *   2.06%*
Hmean     tput-18 39413409.54 (   0.00%) 39896830.55 *   1.23%*
Hmean     tput-24 49101815.85 (   0.00%) 49418141.47 *   0.64%*

* SPECrate benchmark

4,8,16 copies mcf_r(on 1NUMA * 32cores = 32cores)
		Base     	 	Base
		Run Time   	 	Rate
		-------  	 	---------
4 Copies	w/o 580 (w/ 570)       	w/o 11.1 (w/ 11.3)
8 Copies	w/o 647 (w/ 605)       	w/o 20.0 (w/ 21.4, +7%)
16 Copies	w/o 844 (w/ 844)       	w/o 30.6 (w/ 30.6)

32 Copies(on 4NUMA * 32 cores = 128cores)
[w/o patch]
                 Base     Base        Base
Benchmarks       Copies  Run Time     Rate
--------------- -------  ---------  ---------
500.perlbench_r      32        584       87.2  *
502.gcc_r            32        503       90.2  *
505.mcf_r            32        745       69.4  *
520.omnetpp_r        32       1031       40.7  *
523.xalancbmk_r      32        597       56.6  *
525.x264_r            1         --            CE
531.deepsjeng_r      32        336      109    *
541.leela_r          32        556       95.4  *
548.exchange2_r      32        513      163    *
557.xz_r             32        530       65.2  *
 Est. SPECrate2017_int_base              80.3

[w/ patch]
                  Base     Base        Base
Benchmarks       Copies  Run Time     Rate
--------------- -------  ---------  ---------
500.perlbench_r      32        580      87.8 (+0.688%)  *
502.gcc_r            32        477      95.1 (+5.432%)  *
505.mcf_r            32        644      80.3 (+13.574%) *
520.omnetpp_r        32        942      44.6 (+9.58%)   *
523.xalancbmk_r      32        560      60.4 (+6.714%%) *
525.x264_r            1         --           CE
531.deepsjeng_r      32        337      109  (+0.000%) *
541.leela_r          32        554      95.6 (+0.210%) *
548.exchange2_r      32        515      163  (+0.000%) *
557.xz_r             32        524      66.0 (+1.227%) *
 Est. SPECrate2017_int_base              83.7 (+4.062%)

On the other hand, it is slightly helpful to CPU-bound tasks like
kernbench:

* 24-96 threads kernbench (on 4NUMA * 24cores = 96cores)
                     kernbench              kernbench
                     w/o cluster            w/ cluster
Min       user-24    12054.67 (   0.00%)    12024.19 (   0.25%)
Min       syst-24     1751.51 (   0.00%)     1731.68 (   1.13%)
Min       elsp-24      600.46 (   0.00%)      598.64 (   0.30%)
Min       user-48    12361.93 (   0.00%)    12315.32 (   0.38%)
Min       syst-48     1917.66 (   0.00%)     1892.73 (   1.30%)
Min       elsp-48      333.96 (   0.00%)      332.57 (   0.42%)
Min       user-96    12922.40 (   0.00%)    12921.17 (   0.01%)
Min       syst-96     2143.94 (   0.00%)     2110.39 (   1.56%)
Min       elsp-96      211.22 (   0.00%)      210.47 (   0.36%)
Amean     user-24    12063.99 (   0.00%)    12030.78 *   0.28%*
Amean     syst-24     1755.20 (   0.00%)     1735.53 *   1.12%*
Amean     elsp-24      601.60 (   0.00%)      600.19 (   0.23%)
Amean     user-48    12362.62 (   0.00%)    12315.56 *   0.38%*
Amean     syst-48     1921.59 (   0.00%)     1894.95 *   1.39%*
Amean     elsp-48      334.10 (   0.00%)      332.82 *   0.38%*
Amean     user-96    12925.27 (   0.00%)    12922.63 (   0.02%)
Amean     syst-96     2146.66 (   0.00%)     2122.20 *   1.14%*
Amean     elsp-96      211.96 (   0.00%)      211.79 (   0.08%)

Note this patch isn't an universal win, it might hurt those workload
which can benefit from packing. Though tasks which want to take
advantages of lower communication latency of one cluster won't
necessarily been packed in one cluster while kernel is not aware of
clusters, they have some chance to be randomly packed. But this
patch will make them more likely spread.

Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Tested-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Xue Sinian <tangyuan911@yeah.net>
2024-11-10 06:30:07 +08:00
..
bpf tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
cgroup tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
configs dist,Makefile: generic-debug config only build kernel rpm 2024-09-25 19:07:13 +08:00
debug tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
dma tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
events tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
gcov tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
irq tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
livepatch tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
locking tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
power tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
printk tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
rcu tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
sched sched: Add cluster scheduler level in core and related Kconfig for ARM64 2024-11-10 06:30:07 +08:00
time tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
tkernel tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
trace tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
.gitignore ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
Kconfig.freezer treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
Kconfig.hz treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
Kconfig.locks treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
Kconfig.preempt tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
Kconfig.tkernel tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
Makefile watchdog: make hardlockup detect code public 2024-11-05 18:55:17 +08:00
acct.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
async.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
audit.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
audit.h tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
audit_fsnotify.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
audit_tree.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
audit_watch.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
auditfilter.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
auditsc.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
backtracetest.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
bounds.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
capability.c
compat.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
configs.c kernel/configs: Replace GPL boilerplate code with SPDX identifier 2019-07-30 18:34:15 +02:00
context_tracking.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
cpu.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
cpu_pm.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
crash_core.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
crash_dump.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
cred.c tkernel: add base tlinux kernel interfaces 2024-06-11 20:09:33 +08:00
delayacct.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
dma.c
exec_domain.c
exit.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
extable.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
fail_function.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
fork.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
freezer.c Revert "libata, freezer: avoid block device removal while system is frozen" 2019-10-06 09:11:37 -06:00
futex.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
gen_kheaders.sh ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
groups.c
hung_task.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
iomem.c mm/nvdimm: add is_ioremap_addr and use that to check ioremap address 2019-07-12 11:05:40 -07:00
irq_work.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
jump_label.c jump_label: Don't warn on __exit jump entries 2019-08-29 15:10:10 +01:00
kallsyms.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
kcmp.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
kcov.c kcov: convert kcov.refcount to refcount_t 2019-03-07 18:32:02 -08:00
kexec.c kexec_load: Disable at runtime if the kernel is locked down 2019-08-19 21:54:15 -07:00
kexec_core.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
kexec_elf.c kexec_elf: support 32 bit ELF files 2019-09-06 23:58:44 +02:00
kexec_file.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
kexec_internal.h
kheaders.c kheaders: Move from proc to sysfs 2019-05-24 20:16:01 +02:00
kill_block.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
kmod.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
kprobes.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
ksysfs.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 170 2019-05-30 11:26:39 -07:00
kthread.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
latencytop.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
modsign_certificate.S tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
modsign_pubkey.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
module-internal.h tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
module.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
module_signature.c MODSIGN: Export module signature definitions 2019-08-05 18:39:56 -04:00
module_signing.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
notifier.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
nsproxy.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
padata.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
panic.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
params.c lockdown: Lock down module params that specify hardware parameters (eg. ioport) 2019-08-19 21:54:16 -07:00
pid.c tkernel: add base tlinux kernel interfaces 2024-06-11 20:09:33 +08:00
pid_namespace.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
profile.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
ptrace.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
range.c
reboot.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
relay.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
resource.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
rseq.c signal: Remove task parameter from force_sig 2019-05-27 09:36:28 -05:00
seccomp.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
signal.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
smp.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
smpboot.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
smpboot.h
softirq.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
stackleak.c
stacktrace.c stacktrace: Don't skip first entry on noncurrent tasks 2019-11-04 21:19:25 +01:00
stop_machine.c stop_machine: Avoid potential race behaviour 2019-10-17 12:47:12 +02:00
sys.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
sys_ni.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
sysctl-test.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
sysctl.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
sysctl_binary.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
task_work.c
taskstats.c tkernel: add base tlinux kernel interfaces 2024-06-11 20:09:33 +08:00
test_kprobes.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 25 2019-05-21 11:52:39 +02:00
torture.c torture: Remove exporting of internal functions 2019-08-01 14:30:22 -07:00
tracepoint.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
tsacct.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
ucount.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
uid16.c
uid16.h
umh.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
up.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
user-return-notifier.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
user.c Keyrings namespacing 2019-07-08 19:36:47 -07:00
user_namespace.c Keyrings namespacing 2019-07-08 19:36:47 -07:00
utsname.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
utsname_sysctl.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
watchdog.c kprobes/arm64: Blacklist sdei watchdog callback functions 2024-11-05 18:57:53 +08:00
watchdog_hld.c sdei_watchdog: Fix compile error when PPC_WATCHDOG is disable on PowerPC 2024-11-05 18:57:53 +08:00
workqueue.c tkernel: sync code to the same with tk4 pub/lts/0017-kabi 2024-06-12 13:13:20 +08:00
workqueue_internal.h sched/core, workqueues: Distangle worker accounting from rq lock 2019-04-16 16:55:15 +02:00