OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Jinliang Zheng	9d1d946a72	hygon: turn CONFIG_MICROCODE_HYGON on Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 19:00:32 +08:00
fuhao	1ff035c011	x86/resctrl: Add Hygon QoS support Add support for Hygon QoS feature. Signed-off-by: fuhao <fuh@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 19:00:32 +08:00
Pu Wen	d7b445324e	anolis: perf/x86/uncore: Add L3 PMU support for Hygon family 18h model 6h ANBZ: #5455 Adjust the L3 PMU slicemask and threadmask for Hygon family 18h model 6h processor. Signed-off-by: Pu Wen <puwen@hygon.cn> Reviewed-by: Artie Ding <artie.ding@linux.alibaba.com> Link: https://gitee.com/anolis/cloud-kernel/pulls/2536 Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 19:00:32 +08:00
fuhao	e45c93c358	ALSA: hda: Fix single byte writing issue for Hygon family 18h model 5h On Hygon family 18h model 5h controller, some registers such as GCTL, SD_CTL and SD_CTL_3B should be accessed in dword, or the writing will fail. Conflicts: sound/pci/hda/hda_intel.c Signed-off-by: fuhao <fuh@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 19:00:07 +08:00
fuhao	eaad4d0de2	ALSA: hda: Add support for Hygon family 18h model 5h HD-Audio Add the new PCI ID 0x1d94 0x14a9 for Hygon family 18h model 5h HDA controller. Conflicts: sound/pci/hda/hda_intel.c Signed-off-by: fuhao <fuh@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 18:56:59 +08:00
fuhao	768858da65	EDAC/amd64: Adjust UMC channel for Hygon family 18h model 6h Hygon family 18h model 6h has 2 cs mapped to 1 umc, so adjust for it. Fixes: `6cf6141e` ("EDAC/amd64: Add support for Hygon family 18h model 6h") Signed-off-by: fuhao <fuh@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:30 +08:00
fuhao	af62cfaf8b	x86/amd_nb: Get DF ID from F5 device for Hygon family 18h model 6h The DF ID should be get from DF F5 device for Hygon family 18h model 6h processor. Fixes: `3e980dd8` ("x86/amd_nb: Add support for Hygon family 18h model 6h") Signed-off-by: fuhao <fuh@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:30 +08:00
fuhao	6aaeaee473	EDAC/amd64: Fix intlv_num_chan for Hygon family 18h model 4h Make the modification in commit `0ac06d63` to intlv_num_chan only for Hygon family 18h model 4h. Fixes: `0ac06d63` ("EDAC/amd64: Adjust address translation for Hygon family 18h model 4h") Signed-off-by: fuhao <fuh@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:30 +08:00
fuhao	f687e8b76a	EDAC/amd64: Revert hi_addr_offset for Hygon family 18h model 4h The HiAddrOffset is always the top 4 bits of normalized address, so revert the modification in commit `0ac06d63` to the original implementation. Fixes: `0ac06d63` ("EDAC/amd64: Adjust address translation for Hygon family 18h model 4h") Signed-off-by: fuhao <fuh@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:30 +08:00
Pu Wen	b799eb7121	EDAC/amd64: Add support for Hygon family 18h model 6h Add F0/F6 device IDs for Hygon family 18h model 6h processor. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:30 +08:00
Pu Wen	2bf609d3dd	x86/amd_nb: Add support for Hygon family 18h model 6h Hygon family 18h model 6h processor has the same DF F1 device ID as M05H_DF_F1. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:30 +08:00
Pu Wen	17704da187	x86/cpu: Get CPU topology for Hygon family 18h model 6h Add support to derive CPU topology for Hygon family 18h model 6h processor. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:30 +08:00
Pu Wen	735e1788da	hwmon/k10temp: Add support for Hygon family 18h model 5h Add 18H_M05H DF F3 device ID to get the temperature for Hygon family 18h model 5h processor. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:29 +08:00
Pu Wen	24e1f7cdda	EDAC/amd64: Add support for Hygon family 18h model 5h Hygon family 18h model 5h processor has the same F0/F6 device IDs as F17h_M30h, and the syndrome size is the same as model 4h processor. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:29 +08:00
Pu Wen	2c598bc5d1	x86/amd_nb: Add support for Hygon family 18h model 5h Add root and DF F1/F3/F4 device IDs for Hygon family 18h model 5h processors. But some model 5h processors have the legacy(M04H) DF devices, so add a if conditional to read the df1 register. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:29 +08:00
Pu Wen	9c869204b2	x86/cpu: Get CPU topology and LLC ID for Hygon family 18h model 5h Add support to derive CPU topology for Hygon family 18h model 5h processor, and calculate LLC ID for it from the number of threads sharing the cache. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:29 +08:00
Pu Wen	6b45f25208	i2c-piix4: Remove the IMC detecting for Hygon SMBus Remove IMC detecting path for Hygon processors. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:29 +08:00
Pu Wen	35c1b7081e	hwmon/k10temp: Add support for Hygon family 18h model 4h The DF F3 device ID used to get the temperature for Hygon family 18h model 4h processor is the same as 17H_M30H, but with different offsets, which may span two distributed ranges. The second offset range can be considered as private for Hygon, so use struct hygon_private to describe it. Add a pointer priv in k10temp_data to point to the private data. Add functions k10temp_get_ccd_support_2nd() and hygon_read_temp() to support reading the second offset range. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:29 +08:00
Pu Wen	9680d833c1	EDAC/mce_amd: Use struct cpuinfo_x86.logical_die_id for Hygon NodeId The cpuinfo_x86.cpu_die_id is get from CPUID or MSR in the commit `028c221ed1` ("x86/CPU/AMD: Save AMD NodeId as cpu_die_id"). But the value may not be continuous for Hygon model 4h~6h processors. Use cpuinfo_x86.logical_die_id will always format continuous die (or node) IDs, because it will convert the physical die ID to logical die ID. So use topology_logical_die_id() instead of topology_die_id() to decode UMC ECC errors for Hygon processors. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:29 +08:00
Pu Wen	2c453f4b4e	EDAC/amd64: Adjust address translation for Hygon family 18h model 4h Add Hygon family 18h model 4h processor support for DramOffset and HiAddrOffset, and get the socket interleaving number from DramBase- Address(D18F0x110). Update intlv_num_chan and num_intlv_bits support for Hygon family 18h model 4h processor. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:29 +08:00
Pu Wen	36d1aba7a1	EDAC/amd64: Add support for Hygon family 18h model 4h Hygon family 18h model 4h processor has the same F0/F6 device IDs as F17h_M30h. Add support to determine which DDR memory types it supports, but determine syndrome sizes from different bits of the DRAM ECC control reg. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:29 +08:00
Pu Wen	98cb40237c	EDAC/amd64: Get UMC channel from the 6th nibble for Hygon On Hygon family 18h platforms, we look at the 6th nibble(bit 20~23) in the instance_id to derive the channel number. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:29 +08:00
Pu Wen	038d8bb6ce	iommu/hygon: Add support for Hygon family 18h model 4h IOAPIC The SB IOAPIC is on the device 0xb from Hygon family 18h model 4h. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:28 +08:00
Pu Wen	2336e8989a	x86/amd_nb: Add northbridge support for Hygon family 18h model 4h Add dedicated functions to initialize the northbridge for Hygon family 18h model 4h processors. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:28 +08:00
Pu Wen	88dbf4a46d	x86/amd_nb: Add Hygon family 18h model 4h PCI IDs Add the PCI device IDs for Hygon family 18h model 4h processors. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:28 +08:00
Pu Wen	871c9d1964	x86/microcode/hygon: Add microcode loading support for Hygon processors Add support for loading Hygon microcode, which is compatible with AMD one. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:28 +08:00
Pu Wen	602852c638	x86/cpu/hygon: Modify the CPU topology deriving method for Hygon Hygon processors before model 4h have not use the CPUID leaf 0xB, so use commit `e0ceeae708` ("x86/CPU/hygon: Fix phys_proc_id calculation logic for multi-die processors") to derive the socket ID when running on host. If kernel running on guest, use the hypervisor's default. For model 4h, Hygon processors use CPUID leaf 0xB to identify SMT and CORE level types, so use function detect_extended_topology() to derive the core ID, socket ID and APIC ID. But it still set __max_die_per_package to nodes_per_socket because it lacks the DIE level type. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:28 +08:00
Muralidhara M K	c6a94a3c1e	x86/MCE/AMD: Use an u64 for bank_map commit `4c1cdec319` upstream. Thee maximum number of MCA banks is 64 (MAX_NR_BANKS), see `a0bc32b3ca` ("x86/mce: Increase maximum number of banks to 64"). However, the bank_map which contains a bitfield of which banks to initialize is of type unsigned int and that overflows when those bit numbers are >= 32, leading to UBSAN complaining correctly: UBSAN: shift-out-of-bounds in arch/x86/kernel/cpu/mce/amd.c:1365:38 shift exponent 32 is too large for 32-bit type 'int' Change the bank_map to a u64 and use the proper BIT_ULL() macro when modifying bits in there. [ bp: Rewrite commit message. ] Fixes: `a0bc32b3ca` ("x86/mce: Increase maximum number of banks to 64") Signed-off-by: Muralidhara M K <muralimk@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127151601.1068324-1-muralimk@amd.com [fix conflict during backport] Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:28 +08:00
Mario Limonciello	e9cf096764	hwmon: (k10temp) Don't show Tdie for all Zen/Zen2/Zen3 CPU/APU commit `02a2484cf8` upstream. Tdie is an offset calculation that should only be shown when temp_offset is actually put into a table. This is useless to show for all CPU/APU. Show it only when necessary. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:28 +08:00
Pu Wen	6c20e20ff5	x86/cstate: Allow ACPI C1 FFH MWAIT use on Hygon systems commit `280b68a3b3` upstream. Hygon systems support the MONITOR/MWAIT instructions and these can be used for ACPI C1 in the same way as on AMD and Intel systems. The BIOS declares a C1 state in _CST to use FFH and CPUID_Fn00000005_EDX is non-zero on Hygon systems. Allow ffh_cstate_init() to succeed on Hygon systems to default using FFH MWAIT instead of HALT for ACPI C1. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210528081417.31474-1-puwen@hygon.cn Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:28 +08:00
Pu Wen	0684f6615b	x86/cpu/hygon: Set __max_die_per_package on Hygon commit `59eca2fa19` upstream. Set the maximum DIE per package variable on Hygon using the nodes_per_socket value in order to do per-DIE manipulations for drivers such as powercap. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20210302020217.1827-1-puwen@hygon.cn Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:28 +08:00
Yazen Ghannam	a4192109d3	x86/topology: Set cpu_die_id only if DIE_TYPE found commit `cb09a37972` upstream. CPUID Leaf 0x1F defines a DIE_TYPE level (nb: ECX[8:15] level type == 0x5), but CPUID Leaf 0xB does not. However, detect_extended_topology() will set struct cpuinfo_x86.cpu_die_id regardless of whether a valid Die ID was found. Only set cpu_die_id if a DIE_TYPE level is found. CPU topology code may use another value for cpu_die_id, e.g. the AMD NodeId on AMD-based systems. Code ordering should be maintained so that the CPUID Leaf 0x1F Die ID value will take precedence on systems that may use another value. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201109210659.754018-5-Yazen.Ghannam@amd.com Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:28 +08:00
Akshay Gupta	f3d38fa0fa	x86/mce: Increase maximum number of banks to 64 commit `a0bc32b3ca` upstream. ...because future AMD systems will support up to 64 MCA banks per CPU. MAX_NR_BANKS is used to allocate a number of data structures, and it is used as a ceiling for values read from MCG_CAP[Count]. Therefore, this change will have no functional effect on existing systems with 32 or fewer MCA banks per CPU. However, this will increase the size of the following structures: Global bitmaps: - core.c / mce_banks_ce_disabled - core.c / all_banks - core.c / valid_banks - core.c / toclear - Total: 32 new bits * 4 bitmaps = 16 new bytes Per-CPU bitmaps: - core.c / mce_poll_banks - intel.c / mce_banks_owned - Total: 32 new bits * 2 bitmaps = 8 new bytes The bitmaps are arrays of longs. So this change will only affect 32-bit execution, since there will be one additional long used. There will be no additional memory use on 64-bit execution, because the size of long is 64 bits. Global structs: - amd.c / struct smca_bank smca_banks[]: 16 bytes per bank - core.c / struct mce_bank_dev mce_bank_devs[]: 56 bytes per bank - Total: 32 new banks * (16 + 56) bytes = 2304 new bytes Per-CPU structs: - core.c / struct mce_bank mce_banks_array[]: 16 bytes per bank - Total: 32 new banks * 16 bytes = 512 new bytes 32-bit Total global size increase: 2320 bytes Total per-CPU size increase: 520 bytes 64-bit Total global size increase: 2304 bytes Total per-CPU size increase: 512 bytes This additional memory should still fit within the existing .data section of the kernel binary. However, in the case where it doesn't fit, an additional page (4kB) of memory will be added to the binary to accommodate the extra data which will be the maximum size increase of vmlinux. Signed-off-by: Akshay Gupta <Akshay.Gupta@amd.com> [ Adjust commit message and code comment. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200828192412.320052-1-Yazen.Ghannam@amd.com Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:27 +08:00
Pu Wen	6be72939b0	i2c: designware: Add device HID for Hygon I2C controller commit `384b02d6b8` upstream. Add device HID HYGO0010 to match the Hygon ACPI Vendor ID (HYGO) that was registered in http://www.uefi.org/acpi_id_list, and the I2C controller on Hygon paltform will use the HID. Signed-off-by: Pu Wen <puwen@hygon.cn> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [fix conflict during backport] Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:27 +08:00
Jiasen Lin	b47eb972d4	NTB: Add Hygon Device ID commit `9b5b99a89f` upstream. Signed-off-by: Jiasen Lin <linjiasen@hygon.cn> Signed-off-by: Jon Mason <jdmason@kudzu.us> Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:27 +08:00
Jinliang Zheng	8fbb888f6d	EDAC/amd64: Add AMD family 17h model 60h PCI IDs - conflict resolve commit `b6bea24d41` upstream. Conflict resolve when introduce commit aa9873f9fa1e ("EDAC/amd64: Add AMD family 17h model 60h PCI IDs"). Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:48:27 +08:00
Jinliang Zheng	42724daea4	configs: turn CONFIG_PROC_CHILDREN on Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: mengensun <mengensun@tencent.com>	2024-11-14 17:46:47 +08:00
Jianping Liu	4b200fd1d1	config: enable CONFIG_VFIO_NOIOMMU Qemu not support iommu, PCG want using vfio ko with enable_unsafe_noiommu_mode=1 to improve the speed. Signed-off-by: Jianping Liu <frankjpliu@tencent.com> Reviewed-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: aurelianliu <aurelianliu@tencent.com> Signed-off-by: Yongliang Gao <leonylgao@tencent.com>	2024-11-14 17:46:40 +08:00
Jianping Liu	e5685c444b	config: set CONFIG_NGBE=m To support wangxun NIC, CONFIG_NGBE=m. Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-14 17:46:24 +08:00
Yongliang Gao	f4188f5fe4	config: remove CONFIG_VIRTIO_BLK_SCSI VIRTIO_BLK_F_SCSI support is retired this commit: commit 8fbea7d49c27 ("virtio-blk: remove VIRTIO_BLK_F_SCSI support") we update config file to remove CONFIG_VIRTIO_BLK_SCSI Signed-off-by: Yongliang Gao <leonylgao@tencent.com> Reviewed-by: Jianping Liu <frankjpliu@tencent.com> Signed-off-by: Yongliang Gao <leonylgao@tencent.com>	2024-11-14 17:46:11 +08:00
Yongliang Gao	74f7566cbf	config: remove CONFIG_NET_CLS_RSVP and CONFIG_NET_CLS_RSVP6 rsvp classifier is retired this commit: commit 2bc2a81619d8 ("net/sched: Retire rsvp classifier") we update config file to remove CONFIG_NET_CLS_RSVP and CONFIG_NET_CLS_RSVP6 Signed-off-by: Yongliang Gao <leonylgao@tencent.com> Reviewed-by: Jianping Liu <frankjpliu@tencent.com> Signed-off-by: Yongliang Gao <leonylgao@tencent.com>	2024-11-14 17:45:45 +08:00
Jianping Liu	fa1f2753b2	x86/mce: revert "Add NMIs setup in machine_check func" The function of do_machine_check has called nmi_enter before mce_panic, so it needn't to call nmi_enter at the outside of do_machine_check. Signed-off-by: Jianping Liu <frankjpliu@tencent.com> Reviewed-by: Yongliang Gao <leonylgao@tencent.com>	2024-11-14 17:32:06 +08:00
Jianping Liu	251e419790	dist: delete kernel-modules-public rpm Some customer want install OS from USB card or BMC. OS treat BMC as a USB storage device. So, do not put usb storage ko in kernel modules-public-removable-media rpm, delete kernel-modules-public rpm. More details: --story=1020422414120102612 --bug=1020426283131556911 Signed-off-by: Jianping Liu <frankjpliu@tencent.com> Reviewed-by: Yongliang Gao <leonylgao@tencent.com>	2024-11-12 19:47:08 +08:00
caelli	bb0729d7db	dist: fix uname version Currently, uname -r outputs 5.4.241-24-0017.xx, rpm package is kernel-5.4.241-24.0017.xx, which is not match, fix this by changing uname outputs to 5.4.241-24.0017.xx. Signed-off-by: caelli <caelli@tencent.com> Reviewed-by: yuehongwu <yuehongwu@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-12 19:46:38 +08:00
yuehongwu	6782d5a027	dist: delete tlinux4 tag in dist/Makefile Signed-off-by: yuehongwu <yuehongwu@tencent.com> Reviewed-by: caelli <caelli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-12 19:46:16 +08:00
Jianping Liu	c100c43b51	Makefile: set EXTRAVERSION from 1 to 30 To distinguish between OC version and private version, set kernel EXTRAVERSION from 1 to 30. Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-12 17:11:05 +08:00
chinaljp030	1d6573295e	!261 Backport the support for cluster scheduler level Merge pull request !261 from XueSinian/linux-5.4/devel-rm-flag-SD_FLAG-cluster	2024-11-11 09:28:56 +00:00
Wang ShaoBo	a190985d36	arch_topology: Fix missing clear cluster_cpumask in remove_cpu_topology() mainline inclusion commit `4cc4cc28ec` upstream. ---------------------------------------------------------------------- When testing cpu online and offline, warning happened like this: [ 146.746743] WARNING: CPU: 92 PID: 974 at kernel/sched/topology.c:2215 build_sched_domains+0x81c/0x11b0 [ 146.749988] CPU: 92 PID: 974 Comm: kworker/92:2 Not tainted 5.15.0 #9 [ 146.750402] Hardware name: Huawei TaiShan 2280 V2/BC82AMDDA, BIOS 1.79 08/21/2021 [ 146.751213] Workqueue: events cpuset_hotplug_workfn [ 146.751629] pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 146.752048] pc : build_sched_domains+0x81c/0x11b0 [ 146.752461] lr : build_sched_domains+0x414/0x11b0 [ 146.752860] sp : ffff800040a83a80 [ 146.753247] x29: ffff800040a83a80 x28: ffff20801f13a980 x27: ffff20800448ae00 [ 146.753644] x26: ffff800012a858e8 x25: ffff800012ea48c0 x24: 0000000000000000 [ 146.754039] x23: ffff800010ab7d60 x22: ffff800012f03758 x21: 000000000000005f [ 146.754427] x20: 000000000000005c x19: ffff004080012840 x18: ffffffffffffffff [ 146.754814] x17: 3661613030303230 x16: 30303078303a3239 x15: ffff800011f92b48 [ 146.755197] x14: ffff20be3f95cef6 x13: 2e6e69616d6f642d x12: 6465686373204c4c [ 146.755578] x11: ffff20bf7fc83a00 x10: 0000000000000040 x9 : 0000000000000000 [ 146.755957] x8 : 0000000000000002 x7 : ffffffffe0000000 x6 : 0000000000000002 [ 146.756334] x5 : 0000000090000000 x4 : 00000000f0000000 x3 : 0000000000000001 [ 146.756705] x2 : 0000000000000080 x1 : ffff800012f03860 x0 : 0000000000000001 [ 146.757070] Call trace: [ 146.757421] build_sched_domains+0x81c/0x11b0 [ 146.757771] partition_sched_domains_locked+0x57c/0x978 [ 146.758118] rebuild_sched_domains_locked+0x44c/0x7f0 [ 146.758460] rebuild_sched_domains+0x2c/0x48 [ 146.758791] cpuset_hotplug_workfn+0x3fc/0x888 [ 146.759114] process_one_work+0x1f4/0x480 [ 146.759429] worker_thread+0x48/0x460 [ 146.759734] kthread+0x158/0x168 [ 146.760030] ret_from_fork+0x10/0x20 [ 146.760318] ---[ end trace 82c44aad6900e81a ]--- For some architectures like risc-v and arm64 which use common code clear_cpu_topology() in shutting down CPUx, When CONFIG_SCHED_CLUSTER is set, cluster_sibling in cpu_topology of each sibling adjacent to CPUx is missed clearing, this causes checking failed in topology_span_sane() and rebuilding topology failure at end when CPU online. Different sibling's cluster_sibling in cpu_topology[] when CPU92 offline (CPU 92, 93, 94, 95 are in one cluster): Before revision: CPU [92] [93] [94] [95] cluster_sibling [92] [92-95] [92-95] [92-95] After revision: CPU [92] [93] [94] [95] cluster_sibling [92] [93-95] [93-95] [93-95] Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by: Barry Song <song.bao.hua@hisilicon.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20211110095856.469360-1-bobo.shaobowang@huawei.com Signed-off-by: Xue Sinian <tangyuan911@yeah.net>	2024-11-11 08:21:03 +08:00
Guan Jing	467d9b6f8f	sched/fair: Fix kabi borken in sched_domain_shared commit 222d84a0de0d2dbd75b2c73f469d74868955f3b5 openeuler. -------------------------------- The sched_domain_shared structure is only used as pointer, and other drivers don't use it directly. Signed-off-by: Xue Sinian <tangyuan911@yeah.net>	2024-11-11 08:20:56 +08:00
Chen Yu	66aa97cb7b	sched/fair: Introduce SIS_UTIL to search idle CPU based on sum of util_avg mainline inclusion from mainline-v6.0-rc1 commit `70fb5ccf2e` upstream. -------------------------------- [Problem Statement] select_idle_cpu() might spend too much time searching for an idle CPU, when the system is overloaded. The following histogram is the time spent in select_idle_cpu(), when running 224 instances of netperf on a system with 112 CPUs per LLC domain: @usecs: [0] 533 \| \| [1] 5495 \| \| [2, 4) 12008 \| \| [4, 8) 239252 \| \| [8, 16) 4041924 \|@@@@@@@@@@@@@@ \| [16, 32) 12357398 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \| [32, 64) 14820255 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\| [64, 128) 13047682 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \| [128, 256) 8235013 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \| [256, 512) 4507667 \|@@@@@@@@@@@@@@@ \| [512, 1K) 2600472 \|@@@@@@@@@ \| [1K, 2K) 927912 \|@@@ \| [2K, 4K) 218720 \| \| [4K, 8K) 98161 \| \| [8K, 16K) 37722 \| \| [16K, 32K) 6715 \| \| [32K, 64K) 477 \| \| [64K, 128K) 7 \| \| netperf latency usecs: ======= case load Lat_99th std% TCP_RR thread-224 257.39 ( 0.21) The time spent in select_idle_cpu() is visible to netperf and might have a negative impact. [Symptom analysis] The patch [1] from Mel Gorman has been applied to track the efficiency of select_idle_sibling. Copy the indicators here: SIS Search Efficiency(se_eff%): A ratio expressed as a percentage of runqueues scanned versus idle CPUs found. A 100% efficiency indicates that the target, prev or recent CPU of a task was idle at wakeup. The lower the efficiency, the more runqueues were scanned before an idle CPU was found. SIS Domain Search Efficiency(dom_eff%): Similar, except only for the slower SIS patch. SIS Fast Success Rate(fast_rate%): Percentage of SIS that used target, prev or recent CPUs. SIS Success rate(success_rate%): Percentage of scans that found an idle CPU. The test is based on Aubrey's schedtests tool, including netperf, hackbench, schbench and tbench. Test on vanilla kernel: schedstat_parse.py -f netperf_vanilla.log case load se_eff% dom_eff% fast_rate% success_rate% TCP_RR 28 threads 99.978 18.535 99.995 100.000 TCP_RR 56 threads 99.397 5.671 99.964 100.000 TCP_RR 84 threads 21.721 6.818 73.632 100.000 TCP_RR 112 threads 12.500 5.533 59.000 100.000 TCP_RR 140 threads 8.524 4.535 49.020 100.000 TCP_RR 168 threads 6.438 3.945 40.309 99.999 TCP_RR 196 threads 5.397 3.718 32.320 99.982 TCP_RR 224 threads 4.874 3.661 25.775 99.767 UDP_RR 28 threads 99.988 17.704 99.997 100.000 UDP_RR 56 threads 99.528 5.977 99.970 100.000 UDP_RR 84 threads 24.219 6.992 76.479 100.000 UDP_RR 112 threads 13.907 5.706 62.538 100.000 UDP_RR 140 threads 9.408 4.699 52.519 100.000 UDP_RR 168 threads 7.095 4.077 44.352 100.000 UDP_RR 196 threads 5.757 3.775 35.764 99.991 UDP_RR 224 threads 5.124 3.704 28.748 99.860 schedstat_parse.py -f schbench_vanilla.log (each group has 28 tasks) case load se_eff% dom_eff% fast_rate% success_rate% normal 1 mthread 99.152 6.400 99.941 100.000 normal 2 mthreads 97.844 4.003 99.908 100.000 normal 3 mthreads 96.395 2.118 99.917 99.998 normal 4 mthreads 55.288 1.451 98.615 99.804 normal 5 mthreads 7.004 1.870 45.597 61.036 normal 6 mthreads 3.354 1.346 20.777 34.230 normal 7 mthreads 2.183 1.028 11.257 21.055 normal 8 mthreads 1.653 0.825 7.849 15.549 schedstat_parse.py -f hackbench_vanilla.log (each group has 28 tasks) case load se_eff% dom_eff% fast_rate% success_rate% process-pipe 1 group 99.991 7.692 99.999 100.000 process-pipe 2 groups 99.934 4.615 99.997 100.000 process-pipe 3 groups 99.597 3.198 99.987 100.000 process-pipe 4 groups 98.378 2.464 99.958 100.000 process-pipe 5 groups 27.474 3.653 89.811 99.800 process-pipe 6 groups 20.201 4.098 82.763 99.570 process-pipe 7 groups 16.423 4.156 77.398 99.316 process-pipe 8 groups 13.165 3.920 72.232 98.828 process-sockets 1 group 99.977 5.882 99.999 100.000 process-sockets 2 groups 99.927 5.505 99.996 100.000 process-sockets 3 groups 99.397 3.250 99.980 100.000 process-sockets 4 groups 79.680 4.258 98.864 99.998 process-sockets 5 groups 7.673 2.503 63.659 92.115 process-sockets 6 groups 4.642 1.584 58.946 88.048 process-sockets 7 groups 3.493 1.379 49.816 81.164 process-sockets 8 groups 3.015 1.407 40.845 75.500 threads-pipe 1 group 99.997 0.000 100.000 100.000 threads-pipe 2 groups 99.894 2.932 99.997 100.000 threads-pipe 3 groups 99.611 4.117 99.983 100.000 threads-pipe 4 groups 97.703 2.624 99.937 100.000 threads-pipe 5 groups 22.919 3.623 87.150 99.764 threads-pipe 6 groups 18.016 4.038 80.491 99.557 threads-pipe 7 groups 14.663 3.991 75.239 99.247 threads-pipe 8 groups 12.242 3.808 70.651 98.644 threads-sockets 1 group 99.990 6.667 99.999 100.000 threads-sockets 2 groups 99.940 5.114 99.997 100.000 threads-sockets 3 groups 99.469 4.115 99.977 100.000 threads-sockets 4 groups 87.528 4.038 99.400 100.000 threads-sockets 5 groups 6.942 2.398 59.244 88.337 threads-sockets 6 groups 4.359 1.954 49.448 87.860 threads-sockets 7 groups 2.845 1.345 41.198 77.102 threads-sockets 8 groups 2.871 1.404 38.512 74.312 schedstat_parse.py -f tbench_vanilla.log case load se_eff% dom_eff% fast_rate% success_rate% loopback 28 threads 99.976 18.369 99.995 100.000 loopback 56 threads 99.222 7.799 99.934 100.000 loopback 84 threads 19.723 6.819 70.215 100.000 loopback 112 threads 11.283 5.371 55.371 99.999 loopback 140 threads 0.000 0.000 0.000 0.000 loopback 168 threads 0.000 0.000 0.000 0.000 loopback 196 threads 0.000 0.000 0.000 0.000 loopback 224 threads 0.000 0.000 0.000 0.000 According to the test above, if the system becomes busy, the SIS Search Efficiency(se_eff%) drops significantly. Although some benchmarks would finally find an idle CPU(success_rate% = 100%), it is doubtful whether it is worth it to search the whole LLC domain. [Proposal] It would be ideal to have a crystal ball to answer this question: How many CPUs must a wakeup path walk down, before it can find an idle CPU? Many potential metrics could be used to predict the number. One candidate is the sum of util_avg in this LLC domain. The benefit of choosing util_avg is that it is a metric of accumulated historic activity, which seems to be smoother than instantaneous metrics (such as rq->nr_running). Besides, choosing the sum of util_avg would help predict the load of the LLC domain more precisely, because SIS_PROP uses one CPU's idle time to estimate the total LLC domain idle time. In summary, the lower the util_avg is, the more select_idle_cpu() should scan for idle CPU, and vice versa. When the sum of util_avg in this LLC domain hits 85% or above, the scan stops. The reason to choose 85% as the threshold is that this is the imbalance_pct(117) when a LLC sched group is overloaded. Introduce the quadratic function: y = SCHED_CAPACITY_SCALE - p * x^2 and y'= y / SCHED_CAPACITY_SCALE x is the ratio of sum_util compared to the CPU capacity: x = sum_util / (llc_weight * SCHED_CAPACITY_SCALE) y' is the ratio of CPUs to be scanned in the LLC domain, and the number of CPUs to scan is calculated by: nr_scan = llc_weight * y' Choosing quadratic function is because: [1] Compared to the linear function, it scans more aggressively when the sum_util is low. [2] Compared to the exponential function, it is easier to calculate. [3] It seems that there is no accurate mapping between the sum of util_avg and the number of CPUs to be scanned. Use heuristic scan for now. For a platform with 112 CPUs per LLC, the number of CPUs to scan is: sum_util% 0 5 15 25 35 45 55 65 75 85 86 ... scan_nr 112 111 108 102 93 81 65 47 25 1 0 ... For a platform with 16 CPUs per LLC, the number of CPUs to scan is: sum_util% 0 5 15 25 35 45 55 65 75 85 86 ... scan_nr 16 15 15 14 13 11 9 6 3 0 0 ... Furthermore, to minimize the overhead of calculating the metrics in select_idle_cpu(), borrow the statistics from periodic load balance. As mentioned by Abel, on a platform with 112 CPUs per LLC, the sum_util calculated by periodic load balance after 112 ms would decay to about 0.5 * 0.5 * 0.5 * 0.7 = 8.75%, thus bringing a delay in reflecting the latest utilization. But it is a trade-off. Checking the util_avg in newidle load balance would be more frequent, but it brings overhead - multiple CPUs write/read the per-LLC shared variable and introduces cache contention. Tim also mentioned that, it is allowed to be non-optimal in terms of scheduling for the short-term variations, but if there is a long-term trend in the load behavior, the scheduler can adjust for that. When SIS_UTIL is enabled, the select_idle_cpu() uses the nr_scan calculated by SIS_UTIL instead of the one from SIS_PROP. As Peter and Mel suggested, SIS_UTIL should be enabled by default. This patch is based on the util_avg, which is very sensitive to the CPU frequency invariance. There is an issue that, when the max frequency has been clamp, the util_avg would decay insanely fast when the CPU is idle. Commit `addca28512` ("cpufreq: intel_pstate: Handle no_turbo in frequency invariance") could be used to mitigate this symptom, by adjusting the arch_max_freq_ratio when turbo is disabled. But this issue is still not thoroughly fixed, because the current code is unaware of the user-specified max CPU frequency. [Test result] netperf and tbench were launched with 25% 50% 75% 100% 125% 150% 175% 200% of CPU number respectively. Hackbench and schbench were launched by 1, 2 ,4, 8 groups. Each test lasts for 100 seconds and repeats 3 times. The following is the benchmark result comparison between baseline:vanilla v5.19-rc1 and compare:patched kernel. Positive compare% indicates better performance. Each netperf test is a: netperf -4 -H 127.0.1 -t TCP/UDP_RR -c -C -l 100 netperf.throughput ======= case load baseline(std%) compare%( std%) TCP_RR 28 threads 1.00 ( 0.34) -0.16 ( 0.40) TCP_RR 56 threads 1.00 ( 0.19) -0.02 ( 0.20) TCP_RR 84 threads 1.00 ( 0.39) -0.47 ( 0.40) TCP_RR 112 threads 1.00 ( 0.21) -0.66 ( 0.22) TCP_RR 140 threads 1.00 ( 0.19) -0.69 ( 0.19) TCP_RR 168 threads 1.00 ( 0.18) -0.48 ( 0.18) TCP_RR 196 threads 1.00 ( 0.16) +194.70 ( 16.43) TCP_RR 224 threads 1.00 ( 0.16) +197.30 ( 7.85) UDP_RR 28 threads 1.00 ( 0.37) +0.35 ( 0.33) UDP_RR 56 threads 1.00 ( 11.18) -0.32 ( 0.21) UDP_RR 84 threads 1.00 ( 1.46) -0.98 ( 0.32) UDP_RR 112 threads 1.00 ( 28.85) -2.48 ( 19.61) UDP_RR 140 threads 1.00 ( 0.70) -0.71 ( 14.04) UDP_RR 168 threads 1.00 ( 14.33) -0.26 ( 11.16) UDP_RR 196 threads 1.00 ( 12.92) +186.92 ( 20.93) UDP_RR 224 threads 1.00 ( 11.74) +196.79 ( 18.62) Take the 224 threads as an example, the SIS search metrics changes are illustrated below: vanilla patched 4544492 +237.5% 15338634 sched_debug.cpu.sis_domain_search.avg 38539 +39686.8% 15333634 sched_debug.cpu.sis_failed.avg 128300000 -87.9% 15551326 sched_debug.cpu.sis_scanned.avg 5842896 +162.7% 15347978 sched_debug.cpu.sis_search.avg There is -87.9% less CPU scans after patched, which indicates lower overhead. Besides, with this patch applied, there is -13% less rq lock contention in perf-profile.calltrace.cycles-pp._raw_spin_lock.raw_spin_rq_lock_nested .try_to_wake_up.default_wake_function.woken_wake_function. This might help explain the performance improvement - Because this patch allows the waking task to remain on the previous CPU, rather than grabbing other CPUs' lock. Each hackbench test is a: hackbench -g $job --process/threads --pipe/sockets -l 1000000 -s 100 hackbench.throughput ========= case load baseline(std%) compare%( std%) process-pipe 1 group 1.00 ( 1.29) +0.57 ( 0.47) process-pipe 2 groups 1.00 ( 0.27) +0.77 ( 0.81) process-pipe 4 groups 1.00 ( 0.26) +1.17 ( 0.02) process-pipe 8 groups 1.00 ( 0.15) -4.79 ( 0.02) process-sockets 1 group 1.00 ( 0.63) -0.92 ( 0.13) process-sockets 2 groups 1.00 ( 0.03) -0.83 ( 0.14) process-sockets 4 groups 1.00 ( 0.40) +5.20 ( 0.26) process-sockets 8 groups 1.00 ( 0.04) +3.52 ( 0.03) threads-pipe 1 group 1.00 ( 1.28) +0.07 ( 0.14) threads-pipe 2 groups 1.00 ( 0.22) -0.49 ( 0.74) threads-pipe 4 groups 1.00 ( 0.05) +1.88 ( 0.13) threads-pipe 8 groups 1.00 ( 0.09) -4.90 ( 0.06) threads-sockets 1 group 1.00 ( 0.25) -0.70 ( 0.53) threads-sockets 2 groups 1.00 ( 0.10) -0.63 ( 0.26) threads-sockets 4 groups 1.00 ( 0.19) +11.92 ( 0.24) threads-sockets 8 groups 1.00 ( 0.08) +4.31 ( 0.11) Each tbench test is a: tbench -t 100 $job 127.0.0.1 tbench.throughput ====== case load baseline(std%) compare%( std%) loopback 28 threads 1.00 ( 0.06) -0.14 ( 0.09) loopback 56 threads 1.00 ( 0.03) -0.04 ( 0.17) loopback 84 threads 1.00 ( 0.05) +0.36 ( 0.13) loopback 112 threads 1.00 ( 0.03) +0.51 ( 0.03) loopback 140 threads 1.00 ( 0.02) -1.67 ( 0.19) loopback 168 threads 1.00 ( 0.38) +1.27 ( 0.27) loopback 196 threads 1.00 ( 0.11) +1.34 ( 0.17) loopback 224 threads 1.00 ( 0.11) +1.67 ( 0.22) Each schbench test is a: schbench -m $job -t 28 -r 100 -s 30000 -c 30000 schbench.latency_90%_us ======== case load baseline(std%) compare%( std%) normal 1 mthread 1.00 ( 31.22) -7.36 ( 20.25)* normal 2 mthreads 1.00 ( 2.45) -0.48 ( 1.79) normal 4 mthreads 1.00 ( 1.69) +0.45 ( 0.64) normal 8 mthreads 1.00 ( 5.47) +9.81 ( 14.28) Consider the Standard Deviation, this -7.36% regression might not be valid. Also, a OLTP workload with a commercial RDBMS has been tested, and there is no significant change. There were concerns that unbalanced tasks among CPUs would cause problems. For example, suppose the LLC domain is composed of 8 CPUs, and 7 tasks are bound to CPU0~CPU6, while CPU7 is idle: CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 util_avg 1024 1024 1024 1024 1024 1024 1024 0 Since the util_avg ratio is 87.5%( = 7/8 ), which is higher than 85%, select_idle_cpu() will not scan, thus CPU7 is undetected during scan. But according to Mel, it is unlikely the CPU7 will be idle all the time because CPU7 could pull some tasks via CPU_NEWLY_IDLE. lkp(kernel test robot) has reported a regression on stress-ng.sock on a very busy system. According to the sched_debug statistics, it might be caused by SIS_UTIL terminates the scan and chooses a previous CPU earlier, and this might introduce more context switch, especially involuntary preemption, which impacts a busy stress-ng. This regression has shown that, not all benchmarks in every scenario benefit from idle CPU scan limit, and it needs further investigation. Besides, there is slight regression in hackbench's 16 groups case when the LLC domain has 16 CPUs. Prateek mentioned that we should scan aggressively in an LLC domain with 16 CPUs. Because the cost to search for an idle one among 16 CPUs is negligible. The current patch aims to propose a generic solution and only considers the util_avg. Something like the below could be applied on top of the current patch to fulfill the requirement: if (llc_weight <= 16) nr_scan = nr_scan 32 / llc_weight; For LLC domain with 16 CPUs, the nr_scan will be expanded to 2 times large. The smaller the CPU number this LLC domain has, the larger nr_scan will be expanded. This needs further investigation. There is also ongoing work[2] from Abel to filter out the busy CPUs during wakeup, to further speed up the idle CPU scan. And it could be a following-up optimization on top of this change. Suggested-by: Tim Chen <tim.c.chen@intel.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Yicong Yang <yangyicong@hisilicon.com> Tested-by: Mohini Narkhede <mohini.narkhede@intel.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20220612163428.849378-1-yu.c.chen@intel.com Signed-off-by: Xue Sinian <tangyuan911@yeah.net>	2024-11-10 06:59:46 +08:00

1 2 3 4 5 ...

873783 Commits All Branches Search

873783 Commits

All Branches