OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Jianping Liu	edc9e113fa	config,arm64: enable CONFIG_OPENVSWITCH Some partners need using openvswitch, so enable CONFIG_OPENVSWITCH in aarch64, which already been enabled in x86_64 config. Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-11-04 12:54:06 +08:00
Jianping Liu	ac8052d038	dist,sepc: supprot kernel-debug in core and modules and devel rpm When CONFIG="generic-release", %{rpm_name} is kernel, when CONFIG="generic-debug", %{rpm_name} is kernel-debug. Provides: kernel-debug-core in kernel-debug-core rpm Provides: kernel-debug-modules in kernel-debug-modules rpm Provides: kernel-debug-devel in kernel-debug-devel rpm Signed-off-by: Jianping Liu <frankjpliu@tencent.com> Reviewed-by: Yongliang Gao <leonylgao@tencent.com>	2024-09-27 11:03:34 +08:00
Jianping Liu	3ffed28eb5	dist,Makefile: generic-debug config only build kernel rpm We intend to archive kernle-debug rpm in yum. Release kernel will build perf/tools/bpf-tools rpm, to avoid kernle-debug build the same rpm, disable them. Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-09-25 17:18:16 +08:00
Alex Shi	4c67e8518e	compiler: fix instrumentation_begin redefine issue [tapd] https://tapd.woa.com/20422414/prong/stories/view/1020422414114939518 The contents (instrumentation_begin/instrumentation_end) already defined in include/linux/instrumentation.h Signed-off-by: Alex Shi <alexsshi@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-09-25 17:17:50 +08:00
Jianping Liu	a8e4f391c2	dist: release 5.4.119-20.0009.34 Upstream: no Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-09-23 13:08:36 +08:00
Toke Høiland-Jørgensen	b902278a4a	bpf: Always return target ifindex in bpf_fib_lookup commit `d1c362e1dd` upstream. The bpf_fib_lookup() helper performs a neighbour lookup for the destination IP and returns BPF_FIB_LKUP_NO_NEIGH if this fails, with the expectation that the BPF program will pass the packet up the stack in this case. However, with the addition of bpf_redirect_neigh() that can be used instead to perform the neighbour lookup, at the cost of a bit of duplicated work. For that we still need the target ifindex, and since bpf_fib_lookup() already has that at the time it performs the neighbour lookup, there is really no reason why it can't just return it in any case. So let's just always return the ifindex if the FIB lookup itself succeeds. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@gmail.com> Link: https://lore.kernel.org/bpf/20201009184234.134214-1-toke@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-09-23 13:03:46 +08:00
Jianping Liu	b11f5368c2	dist: release 5.4.119-20.0009.33 Upstream: no Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-09-19 20:00:47 +08:00
Jianping Liu	23786a77d2	config,arm64: open some infiniband config To support ZhongXing NIC RDMA, open CONFIGs as below: CONFIG_INFINIBAND_USER_MAD=m CONFIG_INFINIBAND_USER_ACCESS=m CONFIG_INFINIBAND_USER_MEM=y CONFIG_INFINIBAND_ON_DEMAND_PAGING=y	2024-09-19 19:54:38 +08:00
Jason Xing	8690c0362b	sssnic: add one dependency in Kconfig Getting rid of prefix 'CONFIG_' can solve the issue. In the previous patch, I missed one place. Most importantly, I added dependency on CONFIG_PCI_ATS since This drivers relys on CONFIG_PCI_ATS, so we need to adjust in Kconfig files. Fixes: ee280c4189a13 ("sssnic: support this new driver") Signed-off-by: Jason Xing <kernelxing@tencent.com>	2024-06-27 10:57:00 +08:00
刘诗	67295d85cf	drm: support virtualbox display insmod oc iso by virtualbox, need adjust accuracy. Signed-off-by: aurelianliu <aurelianliu@tencent.com>	2024-06-13 15:03:27 +08:00
Jianping Liu	e33c4e5ff8	dist: release 5.4.119-20.0009.32 Upstream: no Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-06-11 21:03:06 +08:00
Jianping Liu	d0ebc6b8e3	config: sync config to origin 5.4.119-20.0009.32 Sync config.default_kasan and config.performance to the same with 5.4.119-20.0009.32. Note, config.default_kasan and config.performance are useless now. Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-06-11 21:01:05 +08:00
Fuhai Wang	47f0d31be8	netfilter: nf_tables: reject QUEUE/DROP verdict parameters commit f342de4e2f33e0e39165d8639387aa6c19dff660 upstream. This reverts commit `e0abdadcc6`. core.c:nf_hook_slow assumes that the upper 16 bits of NF_DROP verdicts contain a valid errno, i.e. -EPERM, -EHOSTUNREACH or similar, or 0. Due to the reverted commit, its possible to provide a positive value, e.g. NF_ACCEPT (1), which results in use-after-free. Its not clear to me why this commit was made. NF_QUEUE is not used by nftables; "queue" rules in nftables will result in use of "nft_queue" expression. If we later need to allow specifiying errno values from userspace (do not know why), this has to call NF_DROP_GETERR and check that "err <= 0" holds true. Fixes: `e0abdadcc6` ("netfilter: nf_tables: accept QUEUE/DROP verdict parameters") Cc: stable@vger.kernel.org Reported-by: Notselwyn <notselwyn@pwning.tech> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Fuhai Wang <fuhaiwang@tencent.com> Signed-off-by: caelli <caelli@tencent.com>	2024-06-11 20:52:46 +08:00
Vidya Sagar	0baa807ed7	PCI/MSI: Set device flag indicating only 32-bit MSI support commit `2053230af1` upstream. The MSI-X Capability requires devices to support 64-bit Message Addresses, but the MSI Capability can support either 32- or 64-bit addresses. Previously, we set dev->no_64bit_msi for a few broken devices that advertise 64-bit MSI support but don't correctly support it. In addition, check the MSI "64-bit Address Capable" bit for all devices and set dev->no_64bit_msi for devices that don't advertise 64-bit support. This allows msi_verify_entries() to catch arch code defects that assign 64-bit addresses when they're not supported. The warning is helpful to find defects like the one fixed by https://lore.kernel.org/r/20201117165312.25847-1-vidyas@nvidia.com [bhelgaas: set no_64bit_msi in pci_msi_init(), commit log] Link: https://lore.kernel.org/r/20201124105035.24573-1-vidyas@nvidia.com Link: https://lore.kernel.org/r/20201203185110.1583077-4-helgaas@kernel.org Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Xinghui Li <korantli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-06-11 20:52:46 +08:00
Bjorn Helgaas	fa40951618	PCI/MSI: Move MSI/MSI-X init to msi.c commit `cbc40d5c33` upstream. Conflicts: drivers/pci/probe.c conflict with the comment's position Move pci_msi_setup_pci_dev(), which disables MSI and MSI-X interrupts, from probe.c to msi.c so it's with all the other MSI code and more consistent with other capability initialization. This means we must compile msi.c always, even without CONFIG_PCI_MSI, so wrap the rest of msi.c in an #ifdef and adjust the Makefile accordingly. No functional change intended. Link: https://lore.kernel.org/r/20201203185110.1583077-2-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Xinghui Li <korantli@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-06-11 20:52:45 +08:00
Chen Siyu	f6f1ca6db5	update can driver for phytium D2000 Soc fix the problem for setting bitrate on phytium D2000 Soc Signed-off-by: Chen Siyu <chensiyu1321@phytium.com.cn> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-06-11 20:52:45 +08:00
wangyong	70f373423d	Merge CVE-2021-3760 bugfix patch, and the source of the patch is as follows: `1b1499a` nfc: nci: fix the UAF of rf_conn_info object Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-06-11 20:52:45 +08:00
wjl00563	6d6ee45625	powerpc: export arch_trigger_cpumask_backtrace Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-06-11 20:52:44 +08:00
Duanqiang Wen	1b321e2778	config: add txgbe mod default option in X86_64 or arm64 arch, set CONFIG_TXGBE=m. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com>	2024-06-11 20:52:44 +08:00
Duanqiang Wen	b72144b5bd	net: wangxun: txgbe: support wangxun 10GbE driver add support for Wangxun 10GbE driver, source files and functions are the same as wangxun out of box drivevr release version txgbe-1.3.5.1. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com>	2024-06-11 20:52:44 +08:00
Duanqiang Wen	50bf5d4ebd	config: add ngbe mod default option in X86_64 or arm64 arch, set CONFIG_WANGXUN=y, and CONFIG_NGBE=m. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com>	2024-06-11 20:52:43 +08:00
Duanqiang Wen	467b4bd334	net: wangxun: ngbe: support wangxun 1GbE driver add support for wangxun 1GbE driver, source files and functions are the same as wangxun oob ngbe-1.2.5.3. Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com>	2024-06-11 20:52:43 +08:00
Jason Xing	2ae9cf86d2	sssnic: support this new driver All files are extracted from 3snic-eth-3s9xx-driver-sssnic-1.0.6.4-1-src.tar.gz. Here is some key information: 1)please do not remove those #ifdef for compatability something like this because it could hinder your steps. 2) replace four files as wrote in scripts/release.sh file: cp -p -f $MK_DIR/replace/makefile_hw $HW_DIR/Makefile cp -p -f $MK_DIR/replace/makefile_nic $NIC_DIR/Makefile cp -p -f $MK_DIR/replace/sss_linux_kernel.h $CUR_DIR/include/kernel/sss_linux_kernel.h cp -p -f $MK_DIR/replace/sss_hwdev_link.c $CUR_DIR/hw/sss_hwdev_link.c The reason here is very simple: compatability. 3) Add vlan config dependency in Kconfig and do not add more specific configs into some config files. 4) get rid of "Werror" in Makefile. 5) If someone is willing to update to a new version, please keep the makefile and config untouched which I rewrote for the compatability. Signed-off-by: Jason Xing <kernelxing@tencent.com> Signed-off-by: Jianping Liu <frankjpliu@tencent.com>	2024-06-11 20:52:42 +08:00
Jianping Liu	bf0d1166c8	dist: provide kernel version info in kernelcore.rpm Other software, such as anaconda, need kernelcore.rpm provide kernel version info. Signed-off-by: Jianping Liu <frankjpliu@tencent.com> Reviewed-by: Yongliang Gao <leonylgao@tencent.com>	2024-06-11 20:52:42 +08:00
Alex Shi	2c9d8a93d6	config/x86: add usb net driver support Enable them as modules. Signed-off-by: Alex Shi <alexsshi@tencent.com>	2024-06-11 20:52:42 +08:00
Alex Shi	217e598d36	Version: 5.4.119-20.0009 This is new version of 5.4.119-20.0009. Signed-off-by: Alex Shi <alexsshi@tencent.com>	2024-06-11 20:52:41 +08:00
Amir Goldstein	8c16d4b7aa	fsnotify: invalidate dcache before IN_DELETE event commit `a37d9a17f0` upstream. Apparently, there are some applications that use IN_DELETE event as an invalidation mechanism and expect that if they try to open a file with the name reported with the delete event, that it should not contain the content of the deleted file. Commit `49246466a9` ("fsnotify: move fsnotify_nameremove() hook out of d_delete()") moved the fsnotify delete hook before d_delete() so fsnotify will have access to a positive dentry. This allowed a race where opening the deleted file via cached dentry is now possible after receiving the IN_DELETE event. To fix the regression, create a new hook fsnotify_delete() that takes the unlinked inode as an argument and use a helper d_delete_notify() to pin the inode, so we can pass it to fsnotify_delete() after d_delete(). Backporting hint: this regression is from v5.3. Although patch will apply with only trivial conflicts to v5.4 and v5.10, it won't build, because fsnotify_delete() implementation is different in each of those versions (see fsnotify_link()). A follow up patch will fix the fsnotify_unlink/rmdir() calls in pseudo filesystem that do not need to call d_delete(). Link: https://lore.kernel.org/r/20220120215305.282577-1-amir73il@gmail.com Reported-by: Ivan Delalande <colona@arista.com> Link: https://lore.kernel.org/linux-fsdevel/YeNyzoDM5hP5LtGW@visor/ Fixes: `49246466a9` ("fsnotify: move fsnotify_nameremove() hook out of d_delete()") Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Alex Shi <alexsshi@tencent.com>	2024-06-11 20:52:41 +08:00
Kees Cook	75e3d56004	exec: Force single empty string when argv is empty commit `dcd46d897a` upstream. Quoting[1] Ariadne Conill: "In several other operating systems, it is a hard requirement that the second argument to execve(2) be the name of a program, thus prohibiting a scenario where argc < 1. POSIX 2017 also recommends this behaviour, but it is not an explicit requirement[2]: The argument arg0 should point to a filename string that is associated with the process being started by one of the exec functions. ... Interestingly, Michael Kerrisk opened an issue about this in 2008[3], but there was no consensus to support fixing this issue then. Hopefully now that CVE-2021-4034 shows practical exploitative use[4] of this bug in a shellcode, we can reconsider. This issue is being tracked in the KSPP issue tracker[5]." While the initial code searches[6][7] turned up what appeared to be mostly corner case tests, trying to that just reject argv == NULL (or an immediately terminated pointer list) quickly started tripping[8] existing userspace programs. The next best approach is forcing a single empty string into argv and adjusting argc to match. The number of programs depending on argc == 0 seems a smaller set than those calling execve with a NULL argv. Account for the additional stack space in bprm_stack_limits(). Inject an empty string when argc == 0 (and set argc = 1). Warn about the case so userspace has some notice about the change: process './argc0' launched './argc0' with NULL argv: empty string added Additionally WARN() and reject NULL argv usage for kernel threads. [1] https://lore.kernel.org/lkml/20220127000724.15106-1-ariadne@dereferenced.org/ [2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html [3] https://bugzilla.kernel.org/show_bug.cgi?id=8408 [4] https://www.qualys.com/2022/01/25/cve-2021-4034/pwnkit.txt [5] https://github.com/KSPP/linux/issues/176 [6] https://codesearch.debian.net/search?q=execve%5C+%5C%28%5B%5E%2C%5D%2B%2C+NULL&literal=0 [7] https://codesearch.debian.net/search?q=execlp%3F%5Cs%5C%28%5B%5E%2C%5D%2B%2C%5CsNULL&literal=0 [8] https://lore.kernel.org/lkml/20220131144352.GE16385@xsang-OptiPlex-9020/ Reported-by: Ariadne Conill <ariadne@dereferenced.org> Reported-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Rich Felker <dalias@libc.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Ariadne Conill <ariadne@dereferenced.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20220201000947.2453721-1-keescook@chromium.org [vegard: fixed conflicts due to missing 886d7de631da71e30909980fdbf318f7caade262^- and 3950e975431bc914f7e81b8f2a2dbdf2064acb0f^-] Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: sixtywang <sixtywang@tencent.com>	2024-06-11 20:52:41 +08:00
Amir Goldstein	b329e7371a	inotify: show inotify mask flags in proc fdinfo commit `a32e697cda` upstream. The inotify mask flags IN_ONESHOT and IN_EXCL_UNLINK are not "internal to kernel" and should be exposed in procfs fdinfo so CRIU can restore them. Fixes: `6933599697` ("inotify: hide internal kernel bits from fdinfo") Link: https://lore.kernel.org/r/20220422120327.3459282-2-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Alex Shi <alexsshi@tencent.com>	2024-06-11 20:52:40 +08:00
Jiakun Shuai	6c49ee52b1	OC-Phytium config for OpenCloudOS 8 oc-phytium.config is presets for the kernel driver function support of OpenCloudOS 8 on the Phytium desktop and embedded platforms. Signed-off-by: Jiakun Shuai <shuaijiakun1288@phytium.com.cn>	2024-06-11 20:52:40 +08:00
Jiakun Shuai	ffb71a516d	Default disabled Phytium drivers on OpenCloudOS 8.8 This commit documents the Phytium kernel driver module that is disabled by default on OpenCloudOS 8.8. Users can configure and enable these modules as needed. The reason why the module is disabled by default: It may cause conflicts on platforms other than Phytium, or it is disabled by default due to single and rare usage scenarios, in order to improve the efficiency of kernel building. Signed-off-by: Jiakun Shuai <shuaijiakun1288@phytium.com.cn>	2024-06-11 20:52:40 +08:00
Alex Shi	c0ddf78f1f	dist: release 5.4.119-20.0009.28 Upstream: no Signed-off-by: Alex Shi <alexsshi@tencent.com>	2024-06-11 20:52:39 +08:00
Christoph Hellwig	b868ff6759	virtio-blk: remove VIRTIO_BLK_F_SCSI support [upstream commit: `782e067dba`] Since the need for a special flag to support SCSI passthrough on a block device was added in May 2017 the SCSI passthrough support in virtio-blk has been disabled. It has always been a bad idea (just ask the original author..) and we have virtio-scsi for proper passthrough. The feature also never made it into the virtio 1.0 or later specifications. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: XiaoLei Zhu <leonzzhu@tencent.com>	2024-06-11 20:52:37 +08:00
Peter Zijlstra	66c8ef4a22	cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE commit `32d4fd5751` upstream. Commit `c227233ad6` ("intel_idle: enable interrupts before C1 on Xeons") wrecked intel_idle in two ways: - must not have tracing in idle functions - must return with IRQs disabled Additionally, it added a branch for no good reason. Fixes: `c227233ad6` ("intel_idle: enable interrupts before C1 on Xeons") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [ rjw: Moved the intel_idle() kerneldoc comment next to the function ] Cc: 5.16+ <stable@vger.kernel.org> # 5.16+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Chen Zhuo <sagazchen@tencent.com> Signed-off-by: Xinghui Li <korantli@tencent.com>	2024-06-11 20:51:29 +08:00
Artem Bityutskiy	6f7b0b5f2b	intel_idle: make SPR C1 and C1E be independent commit `1548fac47a` upstream. This patch partially reverts the changes made by the following commit: `da0e58c038` intel_idle: add 'preferred_cstates' module argument As that commit describes, on early Sapphire Rapids Xeon platforms the C1 and C1E states were mutually exclusive, so that users could only have either C1 and C6, or C1E and C6. However, Intel firmware engineers managed to remove this limitation and make C1 and C1E to be completely independent, just like on previous Xeon platforms. Therefore, this patch: * Removes commentary describing the old, and now non-existing SPR C1E limitation. * Marks SPR C1E as available by default. * Removes the 'preferred_cstates' parameter handling for SPR. Both C1 and C1E will be available regardless of 'preferred_cstates' value. We expect that all SPR systems are shipping with new firmware, which includes the C1/C1E improvement. Cc: v5.18+ <stable@vger.kernel.org> # v5.18+ Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Chen Zhuo <sagazchen@tencent.com> Signed-off-by: Xinghui Li <korantli@tencent.com>	2024-06-11 20:51:29 +08:00
Artem Bityutskiy	3d77bca89a	intel_idle: Fix the 'preferred_cstates' module parameter commit `39c184a6a9` upstream. Problem description. When user boots kernel up with the 'intel_idle.preferred_cstates=4' option, we enable C1E and disable C1 states on Sapphire Rapids Xeon (SPR). In order for C1E to work on SPR, we have to enable the C1E promotion bit on all CPUs. However, we enable it only on one CPU. Fix description. The 'intel_idle' driver already has the infrastructure for disabling C1E promotion on every CPU. This patch uses the same infrastructure for enabling C1E promotion on every CPU. It changes the boolean 'disable_promotion_to_c1e' variable to a tri-state 'c1e_promotion' variable. Tested on a 2-socket SPR system. I verified the following combinations: * C1E promotion enabled and disabled in BIOS. * Booted with and without the 'intel_idle.preferred_cstates=4' kernel argument. In all 4 cases C1E promotion was correctly set on all CPUs. Also tested on an old Broadwell system, just to make sure it does not cause a regression. C1E promotion was correctly disabled on that system, both C1 and C1E were exposed (as expected). Fixes: `da0e58c038` ("intel_idle: add 'preferred_cstates' module argument") Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> [ rjw: Minor changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Chen Zhuo <sagazchen@tencent.com> Signed-off-by: Xinghui Li <korantli@tencent.com>	2024-06-11 20:51:28 +08:00
Artem Bityutskiy	0f46fd086e	intel_idle: Fix SPR C6 optimization commit `7eac3bd38d` upstream. The Sapphire Rapids (SPR) C6 optimization was added to the end of the 'spr_idle_state_table_update()' function. However, the function has a 'return' which may happen before the optimization has a chance to run. And this may prevent the optimization from happening. This is an unlikely scenario, but possible if user boots with, say, the 'intel_idle.preferred_cstates=6' kernel boot option. This patch fixes the issue by eliminating the problematic 'return' statement. Fixes: `3a9cf77b60` ("intel_idle: add core C6 optimization for SPR") Suggested-by: Jan Beulich <jbeulich@suse.com> Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> [ rjw: Minor changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Chen Zhuo <sagazchen@tencent.com> Signed-off-by: Xinghui Li <korantli@tencent.com>	2024-06-11 20:51:28 +08:00
Artem Bityutskiy	b39482cff5	intel_idle: add core C6 optimization for SPR commit `3a9cf77b60` upstream. Add a Sapphire Rapids Xeon C6 optimization, similar to what we have for Sky Lake Xeon: if package C6 is disabled, adjust C6 exit latency and target residency to match core C6 values, instead of using the default package C6 values. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Chen Zhuo <sagazchen@tencent.com> Signed-off-by: Xinghui Li <korantli@tencent.com>	2024-06-11 20:51:28 +08:00
Artem Bityutskiy	53e20e738f	intel_idle: add 'preferred_cstates' module argument commit `da0e58c038` upstream. On Sapphire Rapids Xeon (SPR) the C1 and C1E states are basically mutually exclusive - only one of them can be enabled. By default, 'intel_idle' driver enables C1 and disables C1E. However, some users prefer to use C1E instead of C1, because it saves more energy. This patch adds a new module parameter ('preferred_cstates') for enabling C1E and disabling C1. Here is the idea behind it. 1. This option has effect only for "mutually exclusive" C-states like C1 and C1E on SPR. 2. It does not have any effect on independent C-states, which do not require other C-states to be disabled (most states on most platforms as of today). 3. For mutually exclusive C-states, the 'intel_idle' driver always has a reasonable default, such as enabling C1 on SPR by default. On other platforms, the default may be different. 4. Users can override the default using the 'preferred_cstates' parameter. 5. The parameter accepts the preferred C-states bit-mask, similarly to the existing 'states_off' parameter. 6. This parameter is not limited to C1/C1E, and leaves room for supporting other mutually exclusive C-states, if they come in the future. Today 'intel_idle' can only be compiled-in, which means that on SPR, in order to disable C1 and enable C1E, users should boot with the following kernel argument: intel_idle.preferred_cstates=4 Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Chen Zhuo <sagazchen@tencent.com> Signed-off-by: Xinghui Li <korantli@tencent.com>	2024-06-11 20:51:27 +08:00
Tony Luck	223b89f1b6	x86/cpu: Add Sapphire Rapids CPU model number commit `be25d1b5ea` upstream. Latest edition (039) of "Intel Architecture Instruction Set Extensions and Future Features Programming Reference" includes three new CPU model numbers. Linux already has the two Ice Lake server ones. Add the new model number for Sapphire Rapids. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200603173352.15506-1-tony.luck@intel.com Signed-off-by: Chen Zhuo <sagazchen@tencent.com> Signed-off-by: Xinghui Li <korantli@tencent.com>	2024-06-11 20:51:27 +08:00
Artem Bityutskiy	f79fb80ce5	intel_idle: add SPR support commit `9edf3c0ffe` upstream. Add Sapphire Rapids Xeon support. Up until very recently, the C1 and C1E C-states were independent, but this has changed in some new chips, including Sapphire Rapids Xeon (SPR). In these chips the C1 and C1E states cannot be enabled at the same time. The "C1E promotion" bit in 'MSR_IA32_POWER_CTL' also has its semantics changed a bit. Here are the C1, C1E, and "C1E promotion" bit rules on Xeons before SPR. 1. If C1E promotion bit is disabled. a. C1 requests end up with C1 C-state. b. C1E requests end up with C1E C-state. 2. If C1E promotion bit is enabled. a. C1 requests end up with C1E C-state. b. C1E requests end up with C1E C-state. Here are the C1, C1E, and "C1E promotion" bit rules on Sapphire Rapids Xeon. 1. If C1E promotion bit is disabled. a. C1 requests end up with C1 C-state. b. C1E requests end up with C1 C-state. 2. If C1E promotion bit is enabled. a. C1 requests end up with C1E C-state. b. C1E requests end up with C1E C-state. Before SPR Xeon, the 'intel_idle' driver was disabling C1E promotion and was exposing C1 and C1E as independent C-states. But on SPR, C1 and C1E cannot be enabled at the same time. This patch adds both C1 and C1E states. However, C1E is marked as with the "CPUIDLE_FLAG_UNUSABLE" flag, which means that in won't be registered by default. The C1E promotion bit will be cleared, which means that by default only C1 and C6 will be registered on SPR. The next patch will add an option for enabling C1E and disabling C1 on SPR. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Chen Zhuo <sagazchen@tencent.com> Signed-off-by: Xinghui Li <korantli@tencent.com>	2024-06-11 20:51:27 +08:00
Artem Bityutskiy	686588a7bb	intel_idle: enable interrupts before C1 on Xeons commit `c227233ad6` upstream. Enable local interrupts before requesting C1 on the last two generations of Intel Xeon platforms: Sky Lake, Cascade Lake, Cooper Lake, Ice Lake. This decreases average C1 interrupt latency by about 5-10%, as measured with the 'wult' tool. The '->enter()' function of the driver enters C-states with local interrupts disabled by executing the 'monitor' and 'mwait' pair of instructions. If an interrupt happens, the CPU exits the C-state and continues executing instructions after 'mwait'. It does not jump to the interrupt handler, because local interrupts are disabled. The cpuidle subsystem enables interrupts a bit later, after doing some housekeeping. With this patch, we enable local interrupts before requesting C1. In this case, if the CPU wakes up because of an interrupt, it will jump to the interrupt handler right away. The cpuidle housekeeping will be done after the pending interrupt(s) are handled. Enabling interrupts before entering a C-state has measurable impact for faster C-states, like C1. Deeper, but slower C-states like C6 do not really benefit from this sort of change, because their latency is a lot higher comparing to the delay added by cpuidle housekeeping. This change was also tested with cyclictest and dbench. In case of Ice Lake, the average cyclictest latency decreased by 5.1%, and the average 'dbench' throughput increased by about 0.8%. Both tests were run for 4 hours with only C1 enabled (all other idle states, including 'POLL', were disabled). CPU frequency was pinned to HFM, and uncore frequency was pinned to the maximum value. The other platforms had similar single-digit percentage improvements. It is worth noting that this patch affects 'cpuidle' statistics a tiny bit. Before this patch, C1 residency did not include the interrupt handling time, but with this patch, it will include it. This is similar to what happens in case of the 'POLL' state, which also runs with interrupts enabled. Suggested-by: Len Brown <len.brown@intel.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Chen Zhuo <sagazchen@tencent.com> Signed-off-by: Xinghui Li <korantli@tencent.com>	2024-06-11 20:51:26 +08:00
Artem Bityutskiy	dcaa41dcb4	intel_idle: add Iclelake-D support commit `22141d5f41` upstream. This patch adds Icelake Xeon D support to the intel_idle driver. Since Icelake D and Icelake SP C-state characteristics the same, we use Icelake SP C-states table for Icelake D as well. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Chen Zhuo <sagazchen@tencent.com> Signed-off-by: Xinghui Li <korantli@tencent.com>	2024-06-11 20:51:26 +08:00
Tom Rix	ce6df7b720	intel_idle: remove definition of DEBUG commit `651bc5816c` upstream. Defining DEBUG should only be done in development. So remove DEBUG. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Chen Zhuo <sagazchen@tencent.com> Signed-off-by: Xinghui Li <korantli@tencent.com>	2024-06-11 20:51:26 +08:00
Peter Zijlstra	91d50bd49f	intel_idle: Build fix commit `4d916140bf` upstream. Because CONFIG_ soup. Fixes: `6e1d2bc675` ("intel_idle: Fix intel_idle() vs tracing") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201130115402.GO3040@hirez.programming.kicks-ass.net Signed-off-by: Chen Zhuo <sagazchen@tencent.com> Signed-off-by: Xinghui Li <korantli@tencent.com>	2024-06-11 20:51:25 +08:00
Peter Zijlstra	69a0944244	intel_idle: Fix intel_idle() vs tracing commit `6e1d2bc675` upstream. cpuidle->enter() callbacks should not call into tracing because RCU has already been disabled. Instead of doing the broadcast thing itself, simply advertise to the cpuidle core that those states stop the timer. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20201123143510.GR3021@hirez.programming.kicks-ass.net Signed-off-by: Chen Zhuo <sagazchen@tencent.com> Signed-off-by: Xinghui Li <korantli@tencent.com>	2024-06-11 20:51:25 +08:00
Chen Yu	24b0b982e5	intel_idle: Fix max_cstate for processor models without C-state tables commit `4e0ba5577d` upstream. Currently intel_idle driver gets the c-state information from ACPI _CST if the processor model is not recognized by it. However the c-state in _CST starts with index 1 which is different from the index in intel_idle driver's internal c-state table. While intel_idle_max_cstate_reached() was previously introduced to deal with intel_idle driver's internal c-state table, re-using this function directly on _CST is incorrect. Fix this by subtracting 1 from the index when checking max_cstate in the _CST case. For example, append intel_idle.max_cstate=1 in boot command line, Before the patch: grep . /sys/devices/system/cpu/cpu0/cpuidle/state/name POLL After the patch: grep . /sys/devices/system/cpu/cpu0/cpuidle/state/name /sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL /sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1_ACPI Fixes: `18734958e9` ("intel_idle: Use ACPI _CST for processor models without C-state tables") Reported-by: Pengfei Xu <pengfei.xu@intel.com> Cc: 5.6+ <stable@vger.kernel.org> # 5.6+ Signed-off-by: Chen Yu <yu.c.chen@intel.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Chen Zhuo <sagazchen@tencent.com> Signed-off-by: Xinghui Li <korantli@tencent.com>	2024-06-11 20:51:25 +08:00
Mel Gorman	dc516a70f4	intel_idle: Ignore _CST if control cannot be taken from the platform commit `75af76d0a3` upstream. `e6d4f08a67` ("intel_idle: Use ACPI _CST on server systems") avoids enabling c-states that have been disabled by the platform with the exception of C1E. Unfortunately, BIOS implementations are not always consistent in terms of how capabilities are advertised and control cannot always be handed over. If control cannot be handed over then intel_idle reports that "ACPI _CST not found or not usable" but does not clear acpi_state_table.count meaning the information is still partially used. This patch ignores ACPI information if CST control cannot be requested from the platform. This was only observed on a number of Haswell platforms that had identical CPUs but not identical BIOS versions. While this problem may be rare overall, 24 separate test cases bisected to this specific commit across 4 separate test machines and is worth addressing. If the situation occurs, the kernel behaves as it did before commit `e6d4f08a67` and uses any c-states that are discovered. The affected test cases were all ones that involved a small number of processes -- exec microbenchmark, pipe microbenchmark, git test suite, netperf, tbench with one client and system call microbenchmark. Each case benefits from being able to use turboboost which is prevented if the lower c-states are unavailable. This may mask real regressions specific to older hardware so it is worth addressing. C-state status before and after the patch 5.9.0-vanilla POLL latency:0 disabled:0 default:enabled 5.9.0-vanilla C1 latency:2 disabled:0 default:enabled 5.9.0-vanilla C1E latency:10 disabled:0 default:enabled 5.9.0-vanilla C3 latency:33 disabled:1 default:disabled 5.9.0-vanilla C6 latency:133 disabled:1 default:disabled 5.9.0-ignore-cst-v1r1 POLL latency:0 disabled:0 default:enabled 5.9.0-ignore-cst-v1r1 C1 latency:2 disabled:0 default:enabled 5.9.0-ignore-cst-v1r1 C1E latency:10 disabled:0 default:enabled 5.9.0-ignore-cst-v1r1 C3 latency:33 disabled:0 default:enabled 5.9.0-ignore-cst-v1r1 C6 latency:133 disabled:0 default:enabled Patch enables C3/C6. Netperf UDP_STREAM netperf-udp 5.5.0 5.9.0 vanilla ignore-cst-v1r1 Hmean send-64 193.41 ( 0.00%) 226.54 * 17.13%* Hmean send-128 392.16 ( 0.00%) 450.54 * 14.89%* Hmean send-256 769.94 ( 0.00%) 881.85 * 14.53%* Hmean send-1024 2994.21 ( 0.00%) 3468.95 * 15.85%* Hmean send-2048 5725.60 ( 0.00%) 6628.99 * 15.78%* Hmean send-3312 8468.36 ( 0.00%) 10288.02 * 21.49%* Hmean send-4096 10135.46 ( 0.00%) 12387.57 * 22.22%* Hmean send-8192 17142.07 ( 0.00%) 19748.11 * 15.20%* Hmean send-16384 28539.71 ( 0.00%) 30084.45 * 5.41%* Fixes: `e6d4f08a67` ("intel_idle: Use ACPI _CST on server systems") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Cc: 5.6+ <stable@vger.kernel.org> # 5.6+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Chen Zhuo <sagazchen@tencent.com> Signed-off-by: Xinghui Li <korantli@tencent.com>	2024-06-11 20:51:24 +08:00
Chen Zhuo	5cdab9b63d	cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic commit `bf9282dc26` upstream. This allows moving the leave_mm() call into generic code before rcu_idle_enter(). Gets rid of more trace_*_rcuidle() users. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.369441600@infradead.org Signed-off-by: Chen Zhuo <sagazchen@tencent.com> Signed-off-by: Xinghui Li <korantli@tencent.com>	2024-06-11 20:51:24 +08:00
Rafael J. Wysocki	fd63805ee8	intel_idle: Add __initdata annotations to init time variables commit `7f843dd712` upstream. Annotate static variables cpuidle_state_table and mwait_substates with __initdata, because they are only used during the initialization of the driver. Also notice that static variable icpu could be annotated analogously and the structure pointed to by it could be __initconst, but two of its fields are accessed via icpu in intel_idle_cpu_init() and auto_demotion_disable(), so introduce two new static variables, auto_demotion_disable_flags and disable_promotion_to_c1e, to hold the values of these fields, set them during the initialization and use them in those functions instead of accessing the source data structure via icpu. That allows icpu to be annotated with __initdata, so do that, and it will also allow some __initconst annotations to be added subsequently. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Chen Zhuo <sagazchen@tencent.com> Signed-off-by: Xinghui Li <korantli@tencent.com>	2024-06-11 20:51:24 +08:00

1 2 3 4 5 ...

873723 Commits All Branches Search

873723 Commits

All Branches