OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Jie Liu	c350a9eb22	sched: Open the kernel configuration for cluster. commit aff649361671b432570e94c9056932f50dd6f101 openeuler. ---------------------------------------------------------------------- In the past configuration, CONFIG_SCHED_CLUSTER was not set. Now, we need to open the configuration. Signed-off-by: Xue Sinian <tangyuan911@yeah.net>	2024-11-10 06:59:36 +08:00
Yicong Yang	55462ed9c5	scheduler: Disable cluster scheduling by default commit 6afb257d6dd71085344e1472ea6e820b5dc0a8e3 openeuler. ---------------------------------------------------------------------- Disable cluster scheduling by default since it's not a universal win. User can choose to enable it through sysctl or at boot time according to their scenario. Signed-off-by: Xue Sinian <tangyuan911@yeah.net>	2024-11-10 06:59:27 +08:00
Tim Chen	867ad8d322	scheduler: Add boot time enabling/disabling of cluster scheduling commit 9e68cc2bf535a2f4e3c33e7e53bbb15815b703c4 openeuler. Reference: https://lore.kernel.org/lkml/cover.1638563225.git.tim.c.chen@linux.intel.com/ ---------------------------------------------------------------------- Add boot time parameter sched_cluster to enable or disable cluster scheduling. Set boot parameter as follow: sched_cluster=0 disables cluster scheduling sched_cluster=1 enables cluster scheduling Signed-off-by: Xue Sinian <tangyuan911@yeah.net>	2024-11-10 06:59:18 +08:00
Tim Chen	def4ef5550	scheduler: Add runtime knob sysctl_sched_cluster commit 8ce3e706b31409147f035c037055caa68e450ce5 openeuler. Reference: https://lore.kernel.org/lkml/cover.1638563225.git.tim.c.chen@linux.intel.com/ ---------------------------------------------------------------------- Allow run time configuration of the scheduler to use cluster scheduling. Configuration can be changed via the sysctl variable /proc/sys/kernel/sched_cluster. Setting it to 1 enable cluster scheduling and setting it to 0 turns it off. Cluster scheduling should benefit independent tasks by load balancing them between clusters. It reaps the most benefit when the system's CPUs are not fully busy, so we can spread the tasks out between the clusters to reduce contention on cluster resource (e.g. L2 cache). However, if the system is expected to operate close to full utilization, the system admin could turn this feature off so as not to incur extra load balancing overhead between the cluster domains. Signed-off-by: Xue Sinian <tangyuan911@yeah.net>	2024-11-10 06:59:08 +08:00
Tim Chen	394d06a94a	scheduler: Create SDTL_SKIP flag to skip topology level commit 211b6fb7d5a8558a453475a08a697e651ca2d0cb openeuler. Reference: https://lore.kernel.org/lkml/cover.1638563225.git.tim.c.chen@linux.intel.com/ ---------------------------------------------------------------------- A system admin may not want to use cluster scheduling. Make changes to allow cluster topology level to be skipped when building sched domains. Create SDTL_SKIP bit on the sched_domain_topology_level flag so we can check if the cluster topology level should be skipped when building sched domains. Signed-off-by: Xue Sinian <tangyuan911@yeah.net>	2024-11-10 06:58:59 +08:00
Barry Song	492ab85a92	sched/fair: Scan cluster before scanning LLC in wake-up path Reference: https://lore.kernel.org/lkml/20220915073423.25535-1-yangyicong@huawei.com/ commit 0c3a4f986962ed94da6e26ba3ec0bdf700945894 openeuler. ---------------------------------------------------------------------- For platforms having clusters like Kunpeng920, CPUs within the same cluster have lower latency when synchronizing and accessing shared resources like cache. Thus, this patch tries to find an idle cpu within the cluster of the target CPU before scanning the whole LLC to gain lower latency. Testing has been done on Kunpeng920 by pinning tasks to one numa and two numa. On Kunpeng920, Each numa has 8 clusters and each cluster has 4 CPUs. With this patch, We noticed enhancement on tbench within one numa or cross two numa. On numa 0: 6.0-rc1 patched Hmean 1 351.20 ( 0.00%) 396.45 * 12.88%* Hmean 2 700.43 ( 0.00%) 793.76 * 13.32%* Hmean 4 1404.42 ( 0.00%) 1583.62 * 12.76%* Hmean 8 2833.31 ( 0.00%) 3147.85 * 11.10%* Hmean 16 5501.90 ( 0.00%) 6089.89 * 10.69%* Hmean 32 10428.59 ( 0.00%) 10619.63 * 1.83%* Hmean 64 8223.39 ( 0.00%) 8306.93 * 1.02%* Hmean 128 7042.88 ( 0.00%) 7068.03 * 0.36%* On numa 0-1: 6.0-rc1 patched Hmean 1 363.06 ( 0.00%) 397.13 * 9.38%* Hmean 2 721.68 ( 0.00%) 789.84 * 9.44%* Hmean 4 1435.15 ( 0.00%) 1566.01 * 9.12%* Hmean 8 2776.17 ( 0.00%) 3007.05 * 8.32%* Hmean 16 5471.71 ( 0.00%) 6103.91 * 11.55%* Hmean 32 10164.98 ( 0.00%) 11531.81 * 13.45%* Hmean 64 17143.28 ( 0.00%) 20078.68 * 17.12%* Hmean 128 14552.70 ( 0.00%) 15156.41 * 4.15%* Hmean 256 12827.37 ( 0.00%) 13326.86 * 3.89%* Note neither Kunpeng920 nor x86 Jacobsville supports SMT, so the SMT branch in the code has not been tested but it supposed to work. Signed-off-by: Xue Sinian <tangyuan911@yeah.net>	2024-11-10 06:58:50 +08:00
Barry Song	620bbcc8cd	sched: Add per_cpu cluster domain info and cpus_share_lowest_cache API Reference: https://lore.kernel.org/lkml/20220915073423.25535-1-yangyicong@huawei.com/ commit 53ad6bf76d9c646e3c8494ed82d90f304c50de1f openeuler. ---------------------------------------------------------------------- Add per-cpu cluster domain info and cpus_share_lowest_cache() API. This is the preparation for the optimization of select_idle_cpu() on platforms with cluster scheduler level. Signed-off-by: Xue Sinian <tangyuan911@yeah.net>	2024-11-10 06:58:33 +08:00
Yicong Yang	c7e1663d7b	arch_topology: Make cluster topology span at least SMT CPUs mainline inclusion from mainline-v6.0-rc5 commit `5ac251c8a0` upstream. ---------------------------------------------------------------------- Currently cpu_clustergroup_mask() will return CPU mask if cluster span more or the same CPUs as cpu_coregroup_mask(). This will result topology borken on non-Cluster SMT machines when building with CONFIG_SCHED_CLUSTER=y. Test with: qemu-system-aarch64 -enable-kvm -machine virt \ -net none \ -cpu host \ -bios ./QEMU_EFI.fd \ -m 2G \ -smp 48,sockets=2,cores=12,threads=2 \ -kernel $Image \ -initrd $Rootfs \ -nographic -append "rdinit=init console=ttyAMA0 sched_verbose loglevel=8" We'll get below error: [ 3.084568] BUG: arch topology borken [ 3.084570] the SMT domain not a subset of the CLS domain Since cluster is a level higher than SMT, fix this by making cluster spans at least SMT CPUs. Fixes: `bfcc439743` ("arch_topology: Limit span of cpu_clustergroup_mask()") Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Ionela Voinescu <ionela.voinescu@arm.com> Cc: Greg KH <gregkh@linuxfoundation.org> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20220905122615.12946-1-yangyicong@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Xue Sinian <tangyuan911@yeah.net>	2024-11-10 06:54:12 +08:00
Ionela Voinescu	33ac9901b5	arch_topology: Limit span of cpu_clustergroup_mask() mainline inclusion from mainline-v6.0-rc1 commit `bfcc439743` upstream. ---------------------------------------------------------------------- Currently the cluster identifier is not set on DT based platforms. The reset or default value is -1 for all the CPUs. Once we assign the cluster identifier values correctly, the cluster_sibling mask will be populated and returned by cpu_clustergroup_mask() to contribute in the creation of the CLS scheduling domain level, if SCHED_CLUSTER is enabled. To avoid topologies that will result in questionable or incorrect scheduling domains, impose restrictions regarding the span of clusters, as presented to scheduling domains building code: cluster_sibling should not span more or the same CPUs as cpu_coregroup_mask(). This is needed in order to obtain a strict separation between the MC and CLS levels, and maintain the same domains for existing platforms in the presence of CONFIG_SCHED_CLUSTER, where the new cluster information is redundant and irrelevant for the scheduler. While previously the scheduling domain builder code would have removed MC as redundant and kept CLS if SCHED_CLUSTER was enabled and the cpu_coregroup_mask() and cpu_clustergroup_mask() spanned the same CPUs, now CLS will be removed and MC kept. Link: https://lore.kernel.org/r/20220704101605.1318280-18-sudeep.holla@arm.com Cc: Darren Hart <darren@os.amperecomputing.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Xue Sinian <tangyuan911@yeah.net>	2024-11-10 06:54:02 +08:00
Dietmar Eggemann	40c88f0323	topology: Remove unused cpu_cluster_mask() mainline inclusion from mainline-v5.19-rc1 commit `15f214f9bd` upstream. ------------------------------------------------------------------------ default_topology[] uses cpu_clustergroup_mask() for the CLS level (guarded by CONFIG_SCHED_CLUSTER) which is currently provided by x86 (arch/x86/kernel/smpboot.c) and arm64 (drivers/base/arch_topology.c). Fixes: `778c558f49` ("sched: Add cluster scheduler level in core and related Kconfig for ARM64") Acked-by: Barry Song <baohua@kernel.org> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20220513093433.425163-1-dietmar.eggemann@arm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Xue Sinian <tangyuan911@yeah.net>	2024-11-10 06:53:56 +08:00
Heiko Carstens	e9220219f0	topology/sysfs: export cluster attributes only if an architectures has support mainline inclusion from mainline-v5.17-rc1 commit `e795707703` upstream. ---------------------------------------------------------------------- The cluster_id and cluster_cpus topology sysfs attributes have been added with commit `c5e22feffd` ("topology: Represent clusters of CPUs within a die"). They are currently only used for x86, arm64, and riscv (via generic arch topology), however they are still present with bogus default values for all other architectures. Instead of enforcing such new sysfs attributes to all architectures, make them only optional visible if an architecture opts in by defining both the topology_cluster_id and topology_cluster_cpumask attributes. This is similar to what was done when the book and drawer topology levels were introduced: avoid useless and therefore confusing sysfs attributes for architectures which cannot make use of them. This should not break any existing applications, since this is a new interface introduced with the v5.16 merge window. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20211129130309.3256168-3-hca@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Xue Sinian <tangyuan911@yeah.net>	2024-11-10 06:30:14 +08:00
Barry Song	1c10a174c1	sched: Add cluster scheduler level in core and related Kconfig for ARM64 mainline inclusion from mainline-v5.16-rc1 commit `778c558f49` upstream. ------------------------------------------------------------------------ This patch adds scheduler level for clusters and automatically enables the load balance among clusters. It will directly benefit a lot of workload which loves more resources such as memory bandwidth, caches. Testing has widely been done in two different hardware configurations of Kunpeng920: 24 cores in one NUMA(6 clusters in each NUMA node); 32 cores in one NUMA(8 clusters in each NUMA node) Workload is running on either one NUMA node or four NUMA nodes, thus, this can estimate the effect of cluster spreading w/ and w/o NUMA load balance. * Stream benchmark: 4threads stream (on 1NUMA * 24cores = 24cores) stream stream w/o patch w/ patch MB/sec copy 29929.64 ( 0.00%) 32932.68 ( 10.03%) MB/sec scale 29861.10 ( 0.00%) 32710.58 ( 9.54%) MB/sec add 27034.42 ( 0.00%) 32400.68 ( 19.85%) MB/sec triad 27225.26 ( 0.00%) 31965.36 ( 17.41%) 6threads stream (on 1NUMA * 24cores = 24cores) stream stream w/o patch w/ patch MB/sec copy 40330.24 ( 0.00%) 42377.68 ( 5.08%) MB/sec scale 40196.42 ( 0.00%) 42197.90 ( 4.98%) MB/sec add 37427.00 ( 0.00%) 41960.78 ( 12.11%) MB/sec triad 37841.36 ( 0.00%) 42513.64 ( 12.35%) 12threads stream (on 1NUMA * 24cores = 24cores) stream stream w/o patch w/ patch MB/sec copy 52639.82 ( 0.00%) 53818.04 ( 2.24%) MB/sec scale 52350.30 ( 0.00%) 53253.38 ( 1.73%) MB/sec add 53607.68 ( 0.00%) 55198.82 ( 2.97%) MB/sec triad 54776.66 ( 0.00%) 56360.40 ( 2.89%) Thus, it could help memory-bound workload especially under medium load. Similar improvement is also seen in lkp-pbzip2: * lkp-pbzip2 benchmark 2-96 threads (on 4NUMA * 24cores = 96cores) lkp-pbzip2 lkp-pbzip2 w/o patch w/ patch Hmean tput-2 11062841.57 ( 0.00%) 11341817.51 * 2.52%* Hmean tput-5 26815503.70 ( 0.00%) 27412872.65 * 2.23%* Hmean tput-8 41873782.21 ( 0.00%) 43326212.92 * 3.47%* Hmean tput-12 61875980.48 ( 0.00%) 64578337.51 * 4.37%* Hmean tput-21 105814963.07 ( 0.00%) 111381851.01 * 5.26%* Hmean tput-30 150349470.98 ( 0.00%) 156507070.73 * 4.10%* Hmean tput-48 237195937.69 ( 0.00%) 242353597.17 * 2.17%* Hmean tput-79 360252509.37 ( 0.00%) 362635169.23 * 0.66%* Hmean tput-96 394571737.90 ( 0.00%) 400952978.48 * 1.62%* 2-24 threads (on 1NUMA * 24cores = 24cores) lkp-pbzip2 lkp-pbzip2 w/o patch w/ patch Hmean tput-2 11071705.49 ( 0.00%) 11296869.10 * 2.03%* Hmean tput-4 20782165.19 ( 0.00%) 21949232.15 * 5.62%* Hmean tput-6 30489565.14 ( 0.00%) 33023026.96 * 8.31%* Hmean tput-8 40376495.80 ( 0.00%) 42779286.27 * 5.95%* Hmean tput-12 61264033.85 ( 0.00%) 62995632.78 * 2.83%* Hmean tput-18 86697139.39 ( 0.00%) 86461545.74 ( -0.27%) Hmean tput-24 104854637.04 ( 0.00%) 104522649.46 * -0.32%* In the case of 6 threads and 8 threads, we see the greatest performance improvement. Similar improvement can be seen on lkp-pixz though the improvement is smaller: * lkp-pixz benchmark 2-24 threads lkp-pixz (on 1NUMA * 24cores = 24cores) lkp-pixz lkp-pixz w/o patch w/ patch Hmean tput-2 6486981.16 ( 0.00%) 6561515.98 * 1.15%* Hmean tput-4 11645766.38 ( 0.00%) 11614628.43 ( -0.27%) Hmean tput-6 15429943.96 ( 0.00%) 15957350.76 * 3.42%* Hmean tput-8 19974087.63 ( 0.00%) 20413746.98 * 2.20%* Hmean tput-12 28172068.18 ( 0.00%) 28751997.06 * 2.06%* Hmean tput-18 39413409.54 ( 0.00%) 39896830.55 * 1.23%* Hmean tput-24 49101815.85 ( 0.00%) 49418141.47 * 0.64%* * SPECrate benchmark 4,8,16 copies mcf_r(on 1NUMA * 32cores = 32cores) Base Base Run Time Rate ------- --------- 4 Copies w/o 580 (w/ 570) w/o 11.1 (w/ 11.3) 8 Copies w/o 647 (w/ 605) w/o 20.0 (w/ 21.4, +7%) 16 Copies w/o 844 (w/ 844) w/o 30.6 (w/ 30.6) 32 Copies(on 4NUMA * 32 cores = 128cores) [w/o patch] Base Base Base Benchmarks Copies Run Time Rate --------------- ------- --------- --------- 500.perlbench_r 32 584 87.2 * 502.gcc_r 32 503 90.2 * 505.mcf_r 32 745 69.4 * 520.omnetpp_r 32 1031 40.7 * 523.xalancbmk_r 32 597 56.6 * 525.x264_r 1 -- CE 531.deepsjeng_r 32 336 109 * 541.leela_r 32 556 95.4 * 548.exchange2_r 32 513 163 * 557.xz_r 32 530 65.2 * Est. SPECrate2017_int_base 80.3 [w/ patch] Base Base Base Benchmarks Copies Run Time Rate --------------- ------- --------- --------- 500.perlbench_r 32 580 87.8 (+0.688%) * 502.gcc_r 32 477 95.1 (+5.432%) * 505.mcf_r 32 644 80.3 (+13.574%) * 520.omnetpp_r 32 942 44.6 (+9.58%) * 523.xalancbmk_r 32 560 60.4 (+6.714%%) * 525.x264_r 1 -- CE 531.deepsjeng_r 32 337 109 (+0.000%) * 541.leela_r 32 554 95.6 (+0.210%) * 548.exchange2_r 32 515 163 (+0.000%) * 557.xz_r 32 524 66.0 (+1.227%) * Est. SPECrate2017_int_base 83.7 (+4.062%) On the other hand, it is slightly helpful to CPU-bound tasks like kernbench: * 24-96 threads kernbench (on 4NUMA * 24cores = 96cores) kernbench kernbench w/o cluster w/ cluster Min user-24 12054.67 ( 0.00%) 12024.19 ( 0.25%) Min syst-24 1751.51 ( 0.00%) 1731.68 ( 1.13%) Min elsp-24 600.46 ( 0.00%) 598.64 ( 0.30%) Min user-48 12361.93 ( 0.00%) 12315.32 ( 0.38%) Min syst-48 1917.66 ( 0.00%) 1892.73 ( 1.30%) Min elsp-48 333.96 ( 0.00%) 332.57 ( 0.42%) Min user-96 12922.40 ( 0.00%) 12921.17 ( 0.01%) Min syst-96 2143.94 ( 0.00%) 2110.39 ( 1.56%) Min elsp-96 211.22 ( 0.00%) 210.47 ( 0.36%) Amean user-24 12063.99 ( 0.00%) 12030.78 * 0.28%* Amean syst-24 1755.20 ( 0.00%) 1735.53 * 1.12%* Amean elsp-24 601.60 ( 0.00%) 600.19 ( 0.23%) Amean user-48 12362.62 ( 0.00%) 12315.56 * 0.38%* Amean syst-48 1921.59 ( 0.00%) 1894.95 * 1.39%* Amean elsp-48 334.10 ( 0.00%) 332.82 * 0.38%* Amean user-96 12925.27 ( 0.00%) 12922.63 ( 0.02%) Amean syst-96 2146.66 ( 0.00%) 2122.20 * 1.14%* Amean elsp-96 211.96 ( 0.00%) 211.79 ( 0.08%) Note this patch isn't an universal win, it might hurt those workload which can benefit from packing. Though tasks which want to take advantages of lower communication latency of one cluster won't necessarily been packed in one cluster while kernel is not aware of clusters, they have some chance to be randomly packed. But this patch will make them more likely spread. Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Tested-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Xue Sinian <tangyuan911@yeah.net>	2024-11-10 06:30:07 +08:00
Jonathan Cameron	6e890d617f	topology: Represent clusters of CPUs within a die mainline inclusion from mainline-v5.16-rc1 commit `c5e22feffd` upstream. ------------------------------------------------------------------------ Both ACPI and DT provide the ability to describe additional layers of topology between that of individual cores and higher level constructs such as the level at which the last level cache is shared. In ACPI this can be represented in PPTT as a Processor Hierarchy Node Structure [1] that is the parent of the CPU cores and in turn has a parent Processor Hierarchy Nodes Structure representing a higher level of topology. For example Kunpeng 920 has 6 or 8 clusters in each NUMA node, and each cluster has 4 cpus. All clusters share L3 cache data, but each cluster has local L3 tag. On the other hand, each clusters will share some internal system bus. +-----------------------------------+ +---------+ \| +------+ +------+ +--------------------------+ \| \| \| CPU0 \| \| cpu1 \| \| +-----------+ \| \| \| +------+ +------+ \| \| \| \| \| \| +----+ L3 \| \| \| \| +------+ +------+ cluster \| \| tag \| \| \| \| \| CPU2 \| \| CPU3 \| \| \| \| \| \| \| +------+ +------+ \| +-----------+ \| \| \| \| \| \| +-----------------------------------+ \| \| +-----------------------------------+ \| \| \| +------+ +------+ +--------------------------+ \| \| \| \| \| \| \| +-----------+ \| \| \| +------+ +------+ \| \| \| \| \| \| \| \| L3 \| \| \| \| +------+ +------+ +----+ tag \| \| \| \| \| \| \| \| \| \| \| \| \| \| +------+ +------+ \| +-----------+ \| \| \| \| \| \| +-----------------------------------+ \| L3 \| \| data \| +-----------------------------------+ \| \| \| +------+ +------+ \| +-----------+ \| \| \| \| \| \| \| \| \| \| \| \| \| +------+ +------+ +----+ L3 \| \| \| \| \| \| tag \| \| \| \| +------+ +------+ \| \| \| \| \| \| \| \| \| \| \| +-----------+ \| \| \| +------+ +------+ +--------------------------+ \| +-----------------------------------\| \| \| +-----------------------------------\| \| \| \| +------+ +------+ +--------------------------+ \| \| \| \| \| \| \| +-----------+ \| \| \| +------+ +------+ \| \| \| \| \| \| +----+ L3 \| \| \| \| +------+ +------+ \| \| tag \| \| \| \| \| \| \| \| \| \| \| \| \| \| +------+ +------+ \| +-----------+ \| \| \| \| \| \| +-----------------------------------+ \| \| +-----------------------------------+ \| \| \| +------+ +------+ +--------------------------+ \| \| \| \| \| \| \| +-----------+ \| \| \| +------+ +------+ \| \| \| \| \| \| \| \| L3 \| \| \| \| +------+ +------+ +---+ tag \| \| \| \| \| \| \| \| \| \| \| \| \| \| +------+ +------+ \| +-----------+ \| \| \| \| \| \| +-----------------------------------+ \| \| +-----------------------------------+ \| \| \| +------+ +------+ +--------------------------+ \| \| \| \| \| \| \| +-----------+ \| \| \| +------+ +------+ \| \| \| \| \| \| \| \| L3 \| \| \| \| +------+ +------+ +--+ tag \| \| \| \| \| \| \| \| \| \| \| \| \| \| +------+ +------+ \| +-----------+ \| \| \| \| +---------+ +-----------------------------------+ That means spreading tasks among clusters will bring more bandwidth while packing tasks within one cluster will lead to smaller cache synchronization latency. So both kernel and userspace will have a chance to leverage this topology to deploy tasks accordingly to achieve either smaller cache latency within one cluster or an even distribution of load among clusters for higher throughput. This patch exposes cluster topology to both kernel and userspace. Libraried like hwloc will know cluster by cluster_cpus and related sysfs attributes. PoC of HWLOC support at [2]. Note this patch only handle the ACPI case. Special consideration is needed for SMT processors, where it is necessary to move 2 levels up the hierarchy from the leaf nodes (thus skipping the processor core level). Note that arm64 / ACPI does not provide any means of identifying a die level in the topology but that may be unrelate to the cluster level. [1] ACPI Specification 6.3 - section 5.2.29.1 processor hierarchy node structure (Type 0) [2] https://github.com/hisilicon/hwloc/tree/linux-cluster Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210924085104.44806-2-21cnbao@gmail.com Signed-off-by: Xue Sinian <tangyuan911@yeah.net>	2024-11-10 06:29:56 +08:00
Valentin Schneider	26e2355cdc	sched/topology: Introduce SD metaflag for flags needing > 1 groups commit `4ee4ea443a` upstream. ------------------------------------------------------------------------ In preparation of cleaning up the sd_degenerate*() functions, mark flags used in sd_degenerate() with the new SDF_NEEDS_GROUPS flag. With this, build a compile-time mask of those SD flags. Note that sd_parent_degenerate() uses an extra flag in its mask, SD_PREFER_SIBLING, which remains singled out for now. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: https://lore.kernel.org/r/20200817113003.20802-8-valentin.schneider@arm.com Signed-off-by: Xue Sinian <tangyuan911@yeah.net>	2024-11-09 17:26:29 +08:00
Valentin Schneider	af30444737	sched/topology: Define and assign sched_domain flag metadata commit `b6e862f386` upstream. ------------------------------------------------------------------------ There are some expectations regarding how sched domain flags should be laid out, but none of them are checked or asserted in sched_domain_debug_one(). After staring at said flags for a while, I've come to realize there's two repeating patterns: - Shared with children: those flags are set from the base CPU domain upwards. Any domain that has it set will have it set in its children. It hints at "some property holds true / some behaviour is enabled until this level". - Shared with parents: those flags are set from the topmost domain downwards. Any domain that has it set will have it set in its parents. It hints at "some property isn't visible / some behaviour is disabled until this level". There are two outliers that (currently) do not map to either of these: o SD_PREFER_SIBLING, which is cleared below levels with SD_ASYM_CPUCAPACITY. The change was introduced by commit: `9c63e84db2` ("sched/core: Disable SD_PREFER_SIBLING on asymmetric CPU capacity domains") as it could break misfit migration on some systems. In light of this, we might want to change it back to make it fit one of the two categories and fix the issue another way. o SD_ASYM_CPUCAPACITY, which gets set on a single level and isn't propagated up nor down. From a topology description point of view, it really wants to be SDF_SHARED_PARENT; this will be rectified in a later patch. Tweak the sched_domain flag declaration to assign each flag an expected layout, and include the rationale for each flag "meta type" assignment as a comment. Consolidate the flag metadata into an array; the index of a flag's metadata can easily be found with log2(flag), IOW __ffs(flag). Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: https://lore.kernel.org/r/20200817113003.20802-5-valentin.schneider@arm.com Signed-off-by: Xue Sinian <tangyuan911@yeah.net>	2024-11-09 17:25:29 +08:00
Valentin Schneider	d400332b6c	sched/topology: Split out SD_* flags declaration to its own file commit `d54a9658a7` upstream. ------------------------------------------------------------------------ To associate the SD flags with some metadata, we need some more structure in the way they are declared. Rather than shove that in a free-standing macro list, move the declaration in a separate file that can be re-imported with different SD_FLAG definitions. This is inspired by what is done with the syscall table (see uapi/asm/unistd.h and sys_call_table). The value assigned to a given SD flag now depends on the order it appears in sd_flags.h. No change in functionality. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: https://lore.kernel.org/r/20200817113003.20802-4-valentin.schneider@arm.com Signed-off-by: Xue Sinian <tangyuan911@yeah.net>	2024-11-09 17:24:03 +08:00
Valentin Schneider	8e6da45b4c	ARM, sched/topology: Remove SD_SHARE_POWERDOMAIN commit `cfe7ddcbd7` upstream. ------------------------------------------------------------------------ This flag was introduced in 2014 by commit: `d77b3ed5c9` ("sched: Add a new SD_SHARE_POWERDOMAIN for sched_domain") but AFAIA it was never leveraged by the scheduler. The closest thing I can think of is EAS caring about frequency domains, and it does that by leveraging performance domains. Remove the flag. No change in functionality is expected. Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: https://lore.kernel.org/r/20200817113003.20802-2-valentin.schneider@arm.com Signed-off-by: Xue Sinian <tangyuan911@yeah.net>	2024-11-09 17:16:43 +08:00
Valentin Schneider	10f3bc1916	sched/topology: Kill SD_LOAD_BALANCE commit `36c5bdc438` upstream. ------------------------------------------------------------------------ That flag is set unconditionally in sd_init(), and no one checks for it anymore. Remove it. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200415210512.805-5-valentin.schneider@arm.com Signed-off-by: Xue Sinian <tangyuan911@yeah.net>	2024-11-09 17:16:20 +08:00
Valentin Schneider	be160a5148	sched: Remove checks against SD_LOAD_BALANCE commit `e669ac8ab9` upstream. ------------------------------------------------------------------------ The SD_LOAD_BALANCE flag is set unconditionally for all domains in sd_init(). By making the sched_domain->flags syctl interface read-only, we have removed the last piece of code that could clear that flag - as such, it will now be always present. Rather than to keep carrying it along, we can work towards getting rid of it entirely. cpusets don't need it because they can make CPUs be attached to the NULL domain (e.g. cpuset with sched_load_balance=0), or to a partitioned root_domain, i.e. a sched_domain hierarchy that doesn't span the entire system (e.g. root cpuset with sched_load_balance=0 and sibling cpusets with sched_load_balance=1). isolcpus apply the same "trick": isolated CPUs are explicitly taken out of the sched_domain rebuild (using housekeeping_cpumask()), so they get the NULL domain treatment as well. Remove the checks against SD_LOAD_BALANCE. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200415210512.805-4-valentin.schneider@arm.com Signed-off-by: Xue Sinian <tangyuan911@yeah.net>	2024-11-09 17:11:06 +08:00
chinaljp030	f2abf181fe	!245 KVM: arm64: Add support for FEAT_TLBIRANGE Merge pull request !245 from 谢晓东/linux-5.4/devel	2024-11-08 08:45:16 +00:00
chinaljp030	f94b2a0c57	!250 Backport pseudo NMI-based watchdog patch for OpenCloudOS-Kernel Merge pull request !250 from lcy/devel-37	2024-11-06 07:15:09 +00:00
chinaljp030	9a02555d16	!249 Backport pseudo NMI for PMU Merge pull request !249 from lcy/devel-35	2024-11-06 06:57:36 +00:00
chinaljp030	2e3ee079b1	!258 [linux-5.4/devel] x86/mce: Add NMIs setup in machine_check func Merge pull request !258 from LeoLiu-oc/linux-5.4-devel-86-mce-nmi	2024-11-06 06:36:19 +00:00
Xiongfeng Wang	e26e124849	sdei_watchdog: Fix compile error when PPC_WATCHDOG is disable on PowerPC commit 0252aa08aafb4a40ea2d821f58e88e99a644b097 openeuler. When I compile the kernel with CONFIG_PPC_WATCHDOG is disabled on PowerPC, I got the following compile error: In file included from kernel/hung_task.c:11:0: ./include/linux/nmi.h: In function ‘touch_nmi_watchdog’: ./include/linux/nmi.h:143:2: error: implicit declaration of function ‘arch_touch_nmi_watchdog’; did you mean ‘touch_nmi_watchdog’? [-Werror=implicit-function-declaration] arch_touch_nmi_watchdog(); ^~~~~~~~~~~~~~~~~~~~~~~ touch_nmi_watchdog It is because CONFIG_HARDLOCKUP_DETECTOR_PERF is still enabled in my situation. Fix it by excluding arch_touch_nmi_watchdog() only when CONFIG_PPC_WATCHDOG is disabled. Signed-off-by: huwentao <huwentao19@h-partners.com>	2024-11-05 18:57:53 +08:00
Xiongfeng Wang	e3a14898a3	sdei_watchdog: avoid possible false hardlockup commit 0fa83fd0f8f7267be1e31c824cedb9d112504785 openeuler. Firmware may not trigger SDEI event as required frequency. SDEI event may be triggered too soon, which cause false hardlockup in kernel. Check the time stamp in sdei_watchdog_callbak and skip the hardlockup check if it is invoked too soon. Signed-off-by: huwentao <huwentao19@h-partners.com>	2024-11-05 18:57:53 +08:00
Xiongfeng Wang	1656fd96c3	kprobes/arm64: Blacklist sdei watchdog callback functions commit bdda54cc39843589ee91a0176ca9a94adf307763 openeuler. Functions called in sdei_handler are not allowed to be kprobed, so marked them as NOKPROBE_SYMBOL. There are so many functions in 'watchdog_check_timestamp()'. Luckily, we don't need 'CONFIG_HARDLOCKUP_CHECK_TIMESTAMP' now. So just make CONFIG_SDEI_WATCHDOG depends on !CONFIG_HARDLOCKUP_CHECK_TIMESTAMP in case someone add 'CONFIG_HARDLOCKUP_CHECK_TIMESTAMP' in the future. Signed-off-by: huwentao <huwentao19@h-partners.com>	2024-11-05 18:57:53 +08:00
Xiongfeng Wang	4108f89bf6	sdei_watchdog: set secure timer period base on 'watchdog_thresh' commit 13ddc12768ca98d36ec03bfa21a30b3ebc91673d openeuler. The period of the secure timer is set to 3s by BIOS. That means the secure timer interrupt will trigger every 3 seconds. To further decrease the NMI watchdog's effect on performance, this patch set the period of the secure timer base on 'watchdog_thresh'. This variable is initiallized to 10s. We can also set the period at runtime by modifying '/proc/sys/kernel/watchdog_thresh' Signed-off-by: huwentao <huwentao19@h-partners.com>	2024-11-05 18:57:53 +08:00
Xiongfeng Wang	3d3ce61def	sdei_watchdog: clear EOI of the secure timer before kdump commit 75ac7be96da43f12bad247de69137500e02fd37f openeuler. When we panic in hardlockup, the secure timer interrupt remains activate because firmware clear eoi after dispatch is completed. This will cause arm_arch_timer interrupt failed to trigger in the second kernel. This patch add a new SMC helper to clear eoi of a certain interrupt and clear eoi of the secure timer before booting the second kernel. Signed-off-by: huwentao <huwentao19@h-partners.com>	2024-11-05 18:57:53 +08:00
Xiongfeng Wang	fb960c0410	sdei_watchdog: refresh 'last_timestamp' when enabling nmi_watchdog commit 5bc048a102ef9c3748464cacce443a0f1d9bed5b openeuler. The trigger period of secure time is set by firmware. We need to check the time_stamp every time the secure time fires to make sure the hardlockup detection is not executed too soon. We need to refresh 'last_timestamp' to the current time when we enable the nmi_watchdog. Otherwise, false hardlockup may be detected when the secure timer fires the first time. Signed-off-by: huwentao <huwentao19@h-partners.com>	2024-11-05 18:57:53 +08:00
Xiongfeng Wang	f0bfc2e73d	watchdog: add nmi_watchdog support for arm64 based on SDEI commit cc19c0b385e3bd423e20465b06eb232678ce5c16 openeuler. Add nmi_watchdog support for arm64 based on SDEI. Signed-off-by: huwentao <huwentao19@h-partners.com>	2024-11-05 18:57:53 +08:00
Xiongfeng Wang	545b1214b3	lockup_detector: init lockup detector after all the init_calls commit bef7d8e1432400f3d78339ac269167e09c15dabd openeuler. We call 'sdei_init' as 'subsys_initcall_sync'. lockup detector need to be initialised after sdei_init. The influence of this patch is that we can not detect the hard lockup in init_calls. Signed-off-by: huwentao <huwentao19@h-partners.com>	2024-11-05 18:57:53 +08:00
Xiongfeng Wang	00082d7172	firmware: arm_sdei: make 'sdei_api_event_disable/enable' public commit cfaccce945988392d70ad42924e76f330c25ab9a openeuler. NMI Watchdog need to enable the event for each core individually. But the existing public api 'sdei_event_enable' enable events for all cores when the event type is private. Signed-off-by: huwentao <huwentao19@h-partners.com>	2024-11-05 18:57:52 +08:00
Xiongfeng Wang	aced53f8a1	firmware: arm_sdei: add interrupt binding api commit 860744b94a10a159562fc491fd7f3ea1388965c1 openeuler. This patch add a interrupt binding api function which returns the binded event number. Signed-off-by: huwentao <huwentao19@h-partners.com>	2024-11-05 18:57:52 +08:00
Xiongfeng Wang	54661581fb	watchdog: make hardlockup detect code public commit 4ffed7d5435d12be6762e6fdef92fd2c67fc27df openeuler. In current code, the hardlockup detect code is contained by CONFIG_HARDLOCKUP_DETECTOR_PERF. This patch makes this code public so that other arch hardlockup detector can use it. Signed-off-by: huwentao <huwentao19@h-partners.com>	2024-11-05 18:55:17 +08:00
chinaljp030	879316c425	!257 Backport jbd2 speed up for OpenCloudOS-Kernel Merge pull request !257 from lcy/devel-38	2024-11-05 09:40:52 +00:00
Julien Thierry	0bdcc78fd5	arm_pmu: arm64: Use NMIs for PMU commit `d8f6267f7c` upstream. Add required PMU interrupt operations for NMIs. Request interrupt lines as NMIs when possible, otherwise fall back to normal interrupts. NMIs are only supported on the arm64 architecture with a GICv3 irqchip. [Alexandru E.: Added that NMIs only work on arm64 + GICv3, print message when PMU is using NMIs] Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox) Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20200924110706.254996-8-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: huwentao <huwentao19@h-partners.com>	2024-11-05 17:04:08 +08:00
Julien Thierry	c6c3369a68	arm_pmu: Introduce pmu_irq_ops commit `f76b130bdb` upstream. Currently the PMU interrupt can either be a normal irq or a percpu irq. Supporting NMI will introduce two cases for each existing one. It becomes a mess of 'if's when managing the interrupt. Define sets of callbacks for operations commonly done on the interrupt. The appropriate set of callbacks is selected at interrupt request time and simplifies interrupt enabling/disabling and freeing. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox) Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20200924110706.254996-7-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: huwentao <huwentao19@h-partners.com>	2024-11-05 17:04:08 +08:00
Julien Thierry	e2cabc0720	KVM: arm64: pmu: Make overflow handler NMI safe commit `95e92e45a4` upstream. kvm_vcpu_kick() is not NMI safe. When the overflow handler is called from NMI context, defer waking the vcpu to an irq_work queue. A vcpu can be freed while it's not running by kvm_destroy_vm(). Prevent running the irq_work for a non-existent vcpu by calling irq_work_sync() on the PMU destroy path. [Alexandru E.: Added irq_work_sync()] Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox) Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Pouloze <suzuki.poulose@arm.com> Cc: kvm@vger.kernel.org Cc: kvmarm@lists.cs.columbia.edu Link: https://lore.kernel.org/r/20200924110706.254996-6-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: huwentao <huwentao19@h-partners.com>	2024-11-05 17:04:08 +08:00
Julien Thierry	db279eaccd	arm64: perf: Defer irq_work to IPI_IRQ_WORK commit `05ab728133` upstream. When handling events, armv8pmu_handle_irq() calls perf_event_overflow(), and subsequently calls irq_work_run() to handle any work queued by perf_event_overflow(). As perf_event_overflow() raises IPI_IRQ_WORK when queuing the work, this isn't strictly necessary and the work could be handled as part of the IPI_IRQ_WORK handler. In the common case the IPI handler will run immediately after the PMU IRQ handler, and where the PE is heavily loaded with interrupts other handlers may run first, widening the window where some counters are disabled. In practice this window is unlikely to be a significant issue, and removing the call to irq_work_run() would make the PMU IRQ handler NMI safe in addition to making it simpler, so let's do that. [Alexandru E.: Reworded commit message] Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20200924110706.254996-5-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: huwentao <huwentao19@h-partners.com>	2024-11-05 17:04:07 +08:00
Julien Thierry	00baa292fc	arm64: perf: Remove PMU locking commit `2a0e2a02e4` upstream. The PMU is disabled and enabled, and the counters are programmed from contexts where interrupts or preemption is disabled. The functions to toggle the PMU and to program the PMU counters access the registers directly and don't access data modified by the interrupt handler. That, and the fact that they're always called from non-preemptible contexts, means that we don't need to disable interrupts or use a spinlock. [Alexandru E.: Explained why locking is not needed, removed WARN_ONs] Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox) Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20200924110706.254996-4-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: huwentao <huwentao19@h-partners.com>	2024-11-05 17:04:07 +08:00
Mark Rutland	2e813f7d0a	arm64: perf: Avoid PMXEV* indirection commit `0fdf1bb759` upstream. Currently we access the counter registers and their respective type registers indirectly. This requires us to write to PMSELR, issue an ISB, then access the relevant PMXEV* registers. This is unfortunate, because: * Under virtualization, accessing one register requires two traps to the hypervisor, even though we could access the register directly with a single trap. * We have to issue an ISB which we could otherwise avoid the cost of. * When we use NMIs, the NMI handler will have to save/restore the select register in case the code it preempted was attempting to access a counter or its type register. We can avoid these issues by directly accessing the relevant registers. This patch adds helpers to do so. In armv8pmu_enable_event() we still need the ISB to prevent the PE from reordering the write to PMINTENSET_EL1 register. If the interrupt is enabled before we disable the counter and the new event is configured, we might get an interrupt triggered by the previously programmed event overflowing, but which we wrongly attribute to the event that we are enabling. Execute an ISB after we disable the counter. In the process, remove the comment that refers to the ARMv7 PMU. [Julien T.: Don't inline read/write functions to avoid big code-size increase, remove unused read_pmevtypern function, fix counter index issue.] [Alexandru E.: Removed comment, removed trailing semicolons in macros, added ISB] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox) Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20200924110706.254996-3-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: huwentao <huwentao19@h-partners.com>	2024-11-05 17:04:07 +08:00
Alexandru Elisei	42e446dbad	arm64: perf: Add missing ISB in armv8pmu_enable_counter() commit `490d7b7c08` upstream. Writes to the PMXEVTYPER_EL0 register are not self-synchronising. In armv8pmu_enable_event(), the PE can reorder configuring the event type after we have enabled the counter and the interrupt. This can lead to an interrupt being asserted because of the previous event type that we were counting using the same counter, not the one that we've just configured. The same rationale applies to writes to the PMINTENSET_EL1 register. The PE can reorder enabling the interrupt at any point in the future after we have enabled the event. Prevent both situations from happening by adding an ISB just before we enable the event counter. Fixes: `030896885a` ("arm64: Performance counters support") Reported-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox) Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20200924110706.254996-2-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: huwentao <huwentao19@h-partners.com>	2024-11-05 17:00:35 +08:00
Zhang Yi	475e1be393	jbd2: speed up jbd2_transaction_committed() commit 7c73ddb7589fb8ddb1136b6306dfb72089c81511 upstream. jbd2_transaction_committed() is used to check whether a transaction with the given tid has already committed, it holds j_state_lock in read mode and check the tid of current running transaction and committing transaction, but holding the j_state_lock is expensive. We have already stored the sequence number of the most recently committed transaction in journal t->j_commit_sequence, we could do this check by comparing it with the given tid instead. If the given tid isn't smaller than j_commit_sequence, we can ensure that the given transaction has been committed. That way we could drop the expensive lock and achieve about 10% ~ 20% performance gains in concurrent DIOs on may virtual machine with 100G ramdisk. fio -filename=/mnt/foo -direct=1 -iodepth=10 -rw=$rw -ioengine=libaio \ -bs=4k -size=10G -numjobs=10 -runtime=60 -overwrite=1 -name=test \ -group_reporting Before: overwrite IOPS=88.2k, BW=344MiB/s read IOPS=95.7k, BW=374MiB/s rand overwrite IOPS=98.7k, BW=386MiB/s randread IOPS=102k, BW=397MiB/s After: overwrite IOPS=105k, BW=410MiB/s read IOPS=112k, BW=436MiB/s rand overwrite IOPS=104k, BW=404MiB/s randread IOPS=111k, BW=432MiB/s CC: Dave Chinner <david@fromorbit.com> Suggested-by: Dave Chinner <david@fromorbit.com> Link: https://lore.kernel.org/linux-ext4/ZjILCPNZRHeazSqV@dread.disaster.area/ Signed-off-by: huwentao <huwentao19@h-partners.com>	2024-11-05 15:37:21 +08:00
LeoLiu-oc	b7d24be2e7	x86/mce: Add NMIs setup in machine_check func \#MC is a NMI-like exception. But do not do any setup that NMIs need. This will lead to console_owner_lock issue and HPET dead loop issue. For example, The HPET dead loop issue: CPU x CPU x ---- ---- read_hpet() arch_spin_trylock(&hpet.lock) [CPU x got the hpet.lock] #MCE happened do_machine_check() mce_panic() panic() kmsg_dump() pstore_dump() pstore_record_init() ktime_get_real_fast_ns() read_hpet() [dead loops] This may lead to read_hpet dead loops. The console_owner_lock issue is similar. To avoid these issues, add NMIs setup When Handling #MC Exceptions. Signed-off-by: LeoLiu-oc <leoliu-oc@zhaoxin.com>	2024-11-04 16:42:55 +08:00
谢晓东	1b2905d358	KVM: arm64: Add support for FEAT_TLBIRANGE	2024-10-29 22:16:01 +08:00
chinaljp030	7fbd37f2dd	!236 [linux-5.4/next] Add support of Zhaoxin HDAC and codec Merge pull request !236 from LeoLiu-oc/linux-5.4-next-13-hdac	2024-10-23 09:29:43 +00:00
Weitao Wang	5ab633d0b5	USB: UHCI: adjust zhaoxin UHCI controllers OverCurrent bit value OverCurrent condition is not standardized in the UHCI spec. Zhaoxin UHCI controllers report OverCurrent bit active off. In order to handle OverCurrent condition correctly, the uhci-hcd driver needs to be told to expect the active-off behavior. Suggested-by: Alan Stern <stern@rowland.harvard.edu> Cc: stable@vger.kernel.org Signed-off-by: Weitao Wang <WeitaoWang-oc@zhaoxin.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/20230423105952.4526-1-WeitaoWang-oc@zhaoxin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: LeoLiu-oc <leoliu-oc@zhaoxin.com>	2024-10-21 14:27:55 +08:00
LeoLiu-oc	2c24189e6f	ALSA: hda: Add support of Zhaoxin NB HDAC codec zhaoxin inclusion category: feature ------------------- Add Zhaoxin NB HDAC codec support. Signed-off-by: LeoLiu-oc <leoliu-oc@zhaoxin.com>	2024-10-21 14:04:43 +08:00
LeoLiu-oc	2fa2a551f9	ALSA: hda: Add support of Zhaoxin NB HDAC zhaoxin inclusion category: feature ------------------- Add the new PCI ID 0x1d17 0x9141/0x9142/0x9144 Zhaoxin NB HDAC support. And add some special initialization for Zhaoxin NB HDAC. Signed-off-by: LeoLiu-oc <leoliu-oc@zhaoxin.com>	2024-10-21 14:02:07 +08:00
leoliu-oc	67bfb6833c	ALSA: hda: Add support of Zhaoxin SB HDAC zhaoxin inclusion category: feature ------------------- Add some special initialization for Zhaoxin SB HDAC. Signed-off-by: leoliu-oc <leoliu-oc@zhaoxin.com>	2024-10-21 13:50:13 +08:00

1 2 3 4 5 ...

873733 Commits All Branches Search

873733 Commits

All Branches