Commit Graph

1228223 Commits

Author SHA1 Message Date
Jianping Liu 45d1845157 Merge OCK next branch to TK5 master branch 2024-09-18 11:02:58 +08:00
xiongmengbiao df2c80fc64 crypto: ccp: fix the sev_do_cmd panic on non-Hygon platforms
The Hygon platform indirectly accesses the `sev_cmd_mutex` variable
through `hygon_psp_hooks`.

However, on non-Hygon platforms (such as AMD), `hygon_psp_hooks` is
not initialized, so `sev_cmd_mutex` should be accessed directly.

Signed-off-by: xiongmengbiao <xiongmengbiao@hygon.cn>
2024-09-14 14:29:54 +08:00
Jianping Liu 28fb95817d emm: upadate to v0.1.7.3
Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-09-14 11:15:50 +08:00
Kairui Song 51254ba2b2 emm: fix cgroup initilization check
Should check for cgroup switch instead of root cgroup status.

Signed-off-by: Kairui Song <kasong@tencent.com>
2024-09-13 17:55:43 +08:00
Like Xu d7d7d354bb tools headers UAPI: Sync kvm headers with the kernel sources
commit e30dca91e5667568a6be54886020c43f1f6f95d3 upstream.

To pick the changes in:
  bb58b90b1a8f753b ("KVM: Introduce KVM_SET_USER_MEMORY_REGION2")

That automatically adds support for new ioctl KVM_SET_USER_MEMORY_REGION2.

Link: https://lore.kernel.org/lkml/ZbVLbkngp4oq13qN@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
[ likexu: backport KVM_SET_USER_MEMORY_REGION2 only ]
Signed-off-by: Like Xu <likexu@tencent.com>
2024-09-12 07:19:36 +00:00
Sean Christopherson 6777512fe4 KVM: x86: Prevent excluding the BSP on setting max_vcpu_ids
commit d29bf2ca140410705447ac26100a149b51094c00 upstream.

If the BSP vCPU ID was already set, ensure it doesn't get excluded when
limiting vCPU IDs via KVM_CAP_MAX_VCPU_ID.

[mks: provide commit message, code by Sean]

Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Link: https://lore.kernel.org/r/20240614202859.3597745-4-minipli@grsecurity.net
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
2024-09-12 07:19:36 +00:00
Mathias Krause d3e47ab4a1 KVM: x86: Limit check IDs for KVM_SET_BOOT_CPU_ID
commit 7c305d5118e67d1773158304f1d5128949aea726 upstream.

Do not accept IDs which are definitely invalid by limit checking the
passed value against KVM_MAX_VCPU_IDS and 'max_vcpu_ids' if it was
already set.

This ensures invalid values, especially on 64-bit systems, don't go
unnoticed and lead to a valid id by chance when truncated by the final
assignment.

Fixes: 73880c80aa ("KVM: Break dependency between vcpu index in vcpus array and vcpu_id.")
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Link: https://lore.kernel.org/r/20240614202859.3597745-3-minipli@grsecurity.net
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
2024-09-12 07:19:36 +00:00
Mathias Krause 79a7768be7 KVM: Reject overly excessive IDs in KVM_CREATE_VCPU
commit 8b8e57e5096e47ca842c100c25667195017014ae upstream.

If, on a 64 bit system, a vCPU ID is provided that has the upper 32 bits
set to a non-zero value, it may get accepted if the truncated to 32 bits
integer value is below KVM_MAX_VCPU_IDS and 'max_vcpus'. This feels very
wrong and triggered the reporting logic of PaX's SIZE_OVERFLOW plugin.

Instead of silently truncating and accepting such values, pass the full
value to kvm_vm_ioctl_create_vcpu() and make the existing limit checks
return an error.

Even if this is a userland ABI breaking change, no sane userland could
have ever relied on that behaviour.

Reported-by: PaX's SIZE_OVERFLOW plugin running on grsecurity's syzkaller
Fixes: 6aa8b732ca ("[PATCH] kvm: userspace interface")
Cc: Emese Revfy <re.emese@gmail.com>
Cc: PaX Team <pageexec@freemail.hu>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Link: https://lore.kernel.org/r/20240614202859.3597745-2-minipli@grsecurity.net
[sean: tweak comment about INT_MAX assertion]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
2024-09-12 07:19:36 +00:00
Sean Christopherson ae83f9906e KVM: x86: Make x2APIC ID 100% readonly
commit 4b7c3f6d04bd53f2e5b228b6821fb8f5d1ba3071 upstream.

Ignore the userspace provided x2APIC ID when fixing up APIC state for
KVM_SET_LAPIC, i.e. make the x2APIC fully readonly in KVM.  Commit
a92e2543d6 ("KVM: x86: use hardware-compatible format for APIC ID
register"), which added the fixup, didn't intend to allow userspace to
modify the x2APIC ID.  In fact, that commit is when KVM first started
treating the x2APIC ID as readonly, apparently to fix some race:

 static inline u32 kvm_apic_id(struct kvm_lapic *apic)
 {
-       return (kvm_lapic_get_reg(apic, APIC_ID) >> 24) & 0xff;
+       /* To avoid a race between apic_base and following APIC_ID update when
+        * switching to x2apic_mode, the x2apic mode returns initial x2apic id.
+        */
+       if (apic_x2apic_mode(apic))
+               return apic->vcpu->vcpu_id;
+
+       return kvm_lapic_get_reg(apic, APIC_ID) >> 24;
 }

Furthermore, KVM doesn't support delivering interrupts to vCPUs with a
modified x2APIC ID, but KVM *does* return the modified value on a guest
RDMSR and for KVM_GET_LAPIC.  I.e. no remotely sane setup can actually
work with a modified x2APIC ID.

Making the x2APIC ID fully readonly fixes a WARN in KVM's optimized map
calculation, which expects the LDR to align with the x2APIC ID.

  WARNING: CPU: 2 PID: 958 at arch/x86/kvm/lapic.c:331 kvm_recalculate_apic_map+0x609/0xa00 [kvm]
  CPU: 2 PID: 958 Comm: recalc_apic_map Not tainted 6.4.0-rc3-vanilla+ #35
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.2-1-1 04/01/2014
  RIP: 0010:kvm_recalculate_apic_map+0x609/0xa00 [kvm]
  Call Trace:
   <TASK>
   kvm_apic_set_state+0x1cf/0x5b0 [kvm]
   kvm_arch_vcpu_ioctl+0x1806/0x2100 [kvm]
   kvm_vcpu_ioctl+0x663/0x8a0 [kvm]
   __x64_sys_ioctl+0xb8/0xf0
   do_syscall_64+0x56/0x80
   entry_SYSCALL_64_after_hwframe+0x46/0xb0
  RIP: 0033:0x7fade8b9dd6f

Unfortunately, the WARN can still trigger for other CPUs than the current
one by racing against KVM_SET_LAPIC, so remove it completely.

Reported-by: Michal Luczaj <mhal@rbox.co>
Closes: https://lore.kernel.org/all/814baa0c-1eaa-4503-129f-059917365e80@rbox.co
Reported-by: Haoyu Wu <haoyuwu254@gmail.com>
Closes: https://lore.kernel.org/all/20240126161633.62529-1-haoyuwu254@gmail.com
Reported-by: syzbot+545f1326f405db4e1c3e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000c2a6b9061cbca3c3@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240802202941.344889-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Like Xu <likexu@tencent.com>
2024-09-12 07:19:36 +00:00
Sean Christopherson 7287b19950 KVM: Introduce KVM_SET_USER_MEMORY_REGION2
commit bb58b90b1a8f753b582055adaf448214a8e22c31 upstream.

Introduce a "version 2" of KVM_SET_USER_MEMORY_REGION so that additional
information can be supplied without setting userspace up to fail.  The
padding in the new kvm_userspace_memory_region2 structure will be used to
pass a file descriptor in addition to the userspace_addr, i.e. allow
userspace to point at a file descriptor and map memory into a guest that
is NOT mapped into host userspace.

Alternatively, KVM could simply add "struct kvm_userspace_memory_region2"
without a new ioctl(), but as Paolo pointed out, adding a new ioctl()
makes detection of bad flags a bit more robust, e.g. if the new fd field
is guarded only by a flag and not a new ioctl(), then a userspace bug
(setting a "bad" flag) would generate out-of-bounds access instead of an
-EINVAL error.

Cc: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Message-Id: <20231027182217.3615211-9-seanjc@google.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Like Xu <likexu@tencent.com>
2024-09-12 07:19:36 +00:00
Like Xu 65fdb18aed config/x86: Add EROFS_FS and CACHEFILES for image-granularity acceleration
In the container-based deployment cases, CUBE believes that container image
distribution can be dramatically optimized, which relies on understanding
and reorganizing the container image format, and that tech needs in this
direction require a bit of kernel-space help, starting with kernel features
that are already upstream.

In terms of specific CONFIG choices, tk5 will use the same values as the
default configuration of fedora40-config-6.10.8-200.fc40.x86_64.

Requested-by: Changpeng Liu <changpeliu@tencent.com>
Signed-off-by: Like Xu <likexu@tencent.com>
2024-09-12 11:17:27 +08:00
aurelianliu b1e1aed588 config,x86: open edr
open dpc and edr which can enable pcie edpc function,
when uce comes, edpc could reset device link, resume deveice,
which likes to hotplug this device.

Signed-off-by: Aurelianliu <aurelianliu@tencent.com>
2024-09-11 02:06:12 +00:00
Daniel Maslowski e16058ed64 riscv/purgatory: align riscv_kernel_entry
Fix CVE: CVE-2024-43868

[ Upstream commit fb197c5d2fd24b9af3d4697d0cf778645846d6d5 ]

When alignment handling is delegated to the kernel, everything must be
word-aligned in purgatory, since the trap handler is then set to the
kexec one. Without the alignment, hitting the exception would
ultimately crash. On other occasions, the kernel's handler would take
care of exceptions.
This has been tested on a JH7110 SoC with oreboot and its SBI delegating
unaligned access exceptions and the kernel configured to handle them.

Fixes: 736e30af58 ("RISC-V: Add purgatory")
Signed-off-by: Daniel Maslowski <cyrevolt@gmail.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240719170437.247457-1-cyrevolt@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
2024-09-10 19:36:42 +08:00
Jianping Liu 7a6899b55a config,x86: disable CONFIG_IOMMU_DEBUGFS
To avoid the log like below:
[    0.095948] *************************************************************
[    0.095948] **     NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE    **
[    0.096220] **                                                         **
[    0.096221] **  IOMMU DebugFS SUPPORT HAS BEEN ENABLED IN THIS KERNEL  **
[    0.096222] **                                                         **
[    0.096223] ** This means that this kernel is built to expose internal **
[    0.096224] ** IOMMU data structures, which may compromise security on **
[    0.096225] ** your system.                                            **
[    0.096227] **                                                         **
[    0.096227] ** If you see this message and you are not debugging the   **
[    0.096228] ** kernel, report this immediately to your vendor!         **
[    0.096229] **                                                         **
[    0.096230] **     NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE    **
[    0.096231] *************************************************************
disable CONFIG_IOMMU_DEBUGFS.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-09-06 15:03:58 +08:00
Jianping Liu 64a21c8a25 hung_task,watchdog: set thresh time to 600 seconds
When CONFIG_KASAN is enabled, the kernel will run more slower, set
hung_task and soft lockup thresh time to 600 seconds.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-09-05 15:24:07 +08:00
Jianping Liu 2748b6ef40 Merge OCK next branch to TK5 master branch 2024-09-03 11:26:15 +08:00
Jianping Liu 41d84212f5 dist: release 6.6.47-12
Upstream: no

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
2024-09-03 11:21:49 +08:00
Jianping Liu 5b4374c873 config,x86: set CONFIG_HW_RANDOM_ZHAOXIN to m
Most x86 cpu don't need CONFIG_HW_RANDOM_ZHAOXIN, so set it from y
to m, which could reduce the size of vmlinux.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-09-03 11:17:32 +08:00
Kairui Song 1789bec3a9 emm: fix panic in kdump
Upstream: no
Tested: Tested on Devcloud

Kdump environment disables memory cgroup so related helpers will all
return NULL. Abort early in such case.

Signed-off-by: Kairui Song <kasong@tencent.com>
2024-09-03 03:06:14 +00:00
Huang Cun 0ba277671b config: trace: enable CONFIG_FUNCTION_GRAPH_RETVAL
Signed-off-by: Huang Cun <cunhuang@tencent.com>
2024-09-03 03:04:10 +00:00
Ze Gao 2e2ffe48c5 rue/scx: Fix cgroupv2 cpu controller regression
Due to the odd behavior of gcc designated initializer, we
have to carefully order the fields inside cpu_cftypes.
otherwise some important interfaces like cpu.max could
be lost.

Checkout details in [1]

[1]: https://onlinegdb.com/T-AMLp4zw

Fixes: 8c320a09af ("rue/scx: Add cpu.offline to maintain SCHED_BT compatibility")
Fixes: 2b9d28baab ("rue/scx: Add cpu.scx to the cpu cgroup controller")
Reported-by: likexu <likexu@tencent.com>
Signed-off-by: Ze Gao <zegao@tencent.com>
2024-09-03 02:47:04 +00:00
leoliu-oc 95e99651a2 zhaoxin_rng: Remove redundant pr_err log after matching cpu_ids
On non-Zhaoxin platforms, log related to the zhaoxin rng driver should not
appear.

Signed-off-by: leoliu-oc <leoliu-oc@zhaoxin.com>
2024-09-02 17:11:48 +08:00
Jianping Liu dbef74015d watchdog: increase watchdog_thresh max value to 300 in debug kernel
If enable CONFIG_KASAN or CONFIG_KCSAN, the system will run much
slower, increase watchdog_thresh's max value to avoid soft lockup
or hungtask when run heavy test suit.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-08-30 17:19:41 +08:00
Jianping Liu 42be2152a4 drivers,thirdparty: add backup url for mlnx driver
If getting mlnx driver fail at https://content.mellanox.com, using
backup url for mlnx driver.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-08-29 12:43:18 +08:00
Jianping Liu 198d728bcc dist: check sha256 if mlnx tgz is already exist
In dist/sources/download-and-copy-drivers.sh, if mlnx tgz is greater
than 1024 byte, that stand for really mlnx tgz is exist. Script will
return 0, without check sha256. Change it to check sha256 anyway.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-08-29 10:21:59 +08:00
Jianping Liu 7a56ca3fca drivers,thirdparty: keep copy other thirdparty drivers if with_ofed is 0
If with_ofed is 0, only mlnx driver using kernel native driver. Other
drivers in drivers/thirdparty are commercial quality drivers, they should
be copied to override kernel native drivers before build.

In drivers/thirdparty/copy-drivers.sh, kernel native bnxt directory
is wrong. Fix it by the way.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-08-28 21:37:46 +08:00
Jianping Liu c92c287ac7 drivers,mlnx: add sha256 check for MLNX tgz
To ensure the down load file is correct, add sha256 check for MLNX tgz.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-08-28 14:32:34 +08:00
Jianping Liu 3353ce662c dist: delete useless code in kernel.template.spec
Now release-drivers is in three, so it needn't to judge whether
drivers/thirdparty/release-drivers/mlnx is exist.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-08-28 14:07:21 +08:00
Jianping Liu c99409f7fe Merge OKC next branch to TK5 master branch 2024-08-27 19:48:02 +08:00
leoliu-oc f12c637287 i2c/zhaoxin: switch i2c registration to devm functions
zhaoxin inclusion
category: feature

-------------------

Switch from i2c_add_adapter() to resource managed devm_i2c_add_adapter()
for matching rest of driver initialization, and more concise code.

Signed-off-by: leoliu-oc <leoliu-oc@zhaoxin.com>
2024-08-27 11:46:33 +00:00
Zheng Wu 3970408988 lkp: intel: selftests/bpf: Add netlink helper library
commit 51f1892b5289f0c09745d3bedb36493555d6d90c upstream.

Add a minimal netlink helper library for the BPF selftests. This has been
taken and cut down and cleaned up from iproute2. This covers basics such
as netdevice creation which we need for BPF selftests / BPF CI given
iproute2 package cannot cover it yet.

Stanislav Fomichev suggested that this could be replaced in future by ynl
tool generated C code once it has RTNL support to create devices. Once we
get to this point the BPF CI would also need to add libmnl. If no further
extensions are needed, a second option could be that we remove this code
again once iproute2 package has support.

Intel-SIG: commit 51f1892b5289 lkp: intel: selftests/bpf:
Add netlink helper library.
This patch is to fix this issue of building bpf failed.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231024214904.29825-7-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
[ Zheng Wu: amend commit log ]
Signed-off-by: Zheng Wu <wu.zheng@intel.com>
2024-08-27 11:46:12 +00:00
Jianping Liu c3c178d349 dist: fix make dist-srpm warning
Fix warning as below:
Macro expanded in comment on line 520: %{?dist}" != ".tl4" && "%{?dist}" != ".oc9" && "%{?dist}" != ".tl3"

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-08-27 19:35:15 +08:00
Jianping Liu 8b5cff9dfa dist: exit 1 if download mlnx commercial drivers fail
If download mlnx commercial drivers fail, exit 1 to let users known
about it.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-08-27 19:32:42 +08:00
Jianping Liu 4eb6f1908f dist: not integrate mlnx commercial drivers in oc9
Some oc9 partners have nic driver, and the driver support RDMA and
the driver only compatble with kernel native infiniband.

If integrate mlnx driver, oc9 partners RDMA nic driver cloud not run.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-08-27 19:01:43 +08:00
Jianping Liu 3db1e8157b drivers/thirdparty: put release-drivers in tree
Only the size of mlnx driver tgz is very big, other drivers source size
is not very big. So, remove release-drivers submoule (sub git repo).

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-08-27 17:10:54 +08:00
Jianping Liu d791bba469 dist: add perl-sigtrap build requires
When building mlnx drivers, needing perl-sigtrap rpm. Otherwise,
it will have errors as below:
Can't locate sigtrap.pm in @INC (you may need to install the sigtrap module)
BEGIN failed--compilation aborted at ./install.pl line 44.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-08-27 16:12:17 +08:00
Jianping Liu 880dd21ab3 dist: release 6.6.47-11
Upstream: no

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
2024-08-27 10:59:09 +08:00
Jianping Liu 2482dd4821 config: update x86 config without manual change
Update configs by the commands as below:
make dist-config
make savedefconfig
mv defconfig arch/x86/configs/tencent.config

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-08-26 20:05:30 +08:00
Jianping Liu 4d78ab6896 dist: fix mlnx driver compile error on oc9
When building kernel rpm on oc9, will have error as below:
cc1: error: code model kernel does not support PIC mode
OFED_topdir/BUILD/mlnx-ofa_kernel-23.10/obj/default/compat/main.o] Error 1

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-08-26 19:26:46 +08:00
Jianping Liu 63e2660c48 config,oc: support WLAN and MTD and more SND drivers
OpenCloud partner want use wireless card, sound card, so open the
config to support.

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
2024-08-26 16:33:47 +08:00
Jianping Liu 0569444d2a Merge linux 6.6.47
Conflicts:
	net/sunrpc/svc.c
2024-08-24 09:43:23 +08:00
Jianping Liu 0a76ebf09a Merge linux 6.6.46
Conflicts:
	drivers/platform/x86/intel/ifs/core.c
	drivers/platform/x86/intel/ifs/ifs.h
	kernel/sched/core.c
2024-08-24 09:37:59 +08:00
Jianping Liu e580bc83c2 Merge linux 6.6.45 2024-08-23 19:54:49 +08:00
Jianping Liu d6563b9042 Merge OCK next branch to TK5 master branch 2024-08-23 19:52:09 +08:00
frankjpliu 897ad8fab4 Merge branch 'zegao/scx3' into 'master' (merge request !150)
Add some general scx in-kernel support
5aec0abf10 rue/scx: Kill user tasks in SCHED_EXT when scheduler is gone
a1752a5760 rue/scx: Add readonly sysctl knob kernel.cpu_qos for SCHED_BT compatibility
ed0889e48a rue/scx: Add /proc/bt_stat to maintain SCHED_BT compatibility
8c320a09af rue/scx: Add cpu.offline to maintain SCHED_BT compatibility
2b9d28baab rue/scx: Add cpu.scx to the cpu cgroup controller
576ee0803a rue/scx: Add /proc/scx_stat to do scx cputime accounting
67d151255e rue/scx: Fix lockdep warn on printk with rq lock held
ebf91df4dc rue/scx: Reorder scx_fork_rwsem, cpu_hotplug_lock and scx_cgroup_rwsem
2024-08-23 11:40:38 +00:00
Yongliang Gao 44f5072e76 Revert "sched: adaptive default skew_tick value"
This reverts commit ca7d96bf43.

Maintain consistency and alignment with upstream, and this patch
is not very friendly to virtualization.

Signed-off-by: Yongliang Gao <leonylgao@tencent.com>
Reviewed-by: Jianping Liu <frankjpliu@tencent.com>
2024-08-23 11:32:30 +00:00
frankjpliu a1aa259039 Merge branch 'likexu/kvm/cube-optimization' into 'master' (merge request !158)
KVM optimization: skip srcu-sync && fix guest-tsc jump
Backport four KVM upstream commits for cube use cases:

```
d8992b97df KVM: x86: Don't sync user-written TSC against startup values
c4201bd24f4ae KVM: s390: Don't re-setup dummy routing when KVM_CREATE_IRQCHIP
e3c89f5dd11df KVM: x86: Don't re-setup empty IRQ routing when KVM_CAP_SPLIT_IRQCHIP
fbe4a7e881d44 KVM: Setup empty IRQ routing when creating a VM
```

For the top one commit: KVM: x86: Don't sync user-written TSC against startup values

Add a flag, kvm->arch.user_set_tsc, protected by
kvm->arch.tsc_write_lock, to record that a TSC for at least one vCPU in
the VM *has* been set by userspace, and make the 1-second slop hack only
trigger if user_set_tsc is already set.

For the left commis: KVM: irqchip: synchronize srcu only if needed

We found that it may cost more than 20 milliseconds very accidentally
to enable cap of KVM_CAP_SPLIT_IRQCHIP on a host which has many vms
already.

The reason is that when vmm(qemu/CloudHypervisor) invokes
KVM_CAP_SPLIT_IRQCHIP kvm will call synchronize_srcu_expedited() and
might_sleep and kworker of srcu may cost some delay during this period.
One way makes sence is setup empty irq routing when creating vm and
so that x86/s390 don't need to setup empty/dummy irq routing.

Link: https://lore.kernel.org/all/20240506101751.3145407-1-foxywang@tencent.com/
Link: https://lore.kernel.org/r/20231008025335.7419-1-likexu@tencent.com
Signed-off-by: Like Xu likexu@tencent.com
2024-08-23 11:31:54 +00:00
Yongliang Gao 04ee57b1e7 kabi: Introduce CONFIG_KABI_RESERVE
Disable for images if kabi compatibility is explicitly not needed.

Signed-off-by: Yongliang Gao <leonylgao@tencent.com>
Reviewed-by: Jianping Liu <frankjpliu@tencent.com>
2024-08-23 11:27:49 +00:00
Damien Le Moal 77c6eb435d null_blk: Fix return value of nullb_device_power_store()
[ Upstream commit d9ff882b54f99f96787fa3df7cd938966843c418 ]

Fix CVE: CVE-2024-36478

When powering on a null_blk device that is not already on, the return
value ret that is initialized to be count is reused to check the return
value of null_add_dev(), leading to nullb_device_power_store() to return
null_add_dev() return value (0 on success) instead of "count".
So make sure to set ret to be equal to count when there are no errors.

Fixes: a2db328b0839 ("null_blk: fix null-ptr-dereference while configuring 'power' and 'submit_queues'")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20240527043445.235267-1-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Haisu Wang <haisuwang@tencent.com>
2024-08-23 11:24:47 +00:00
Yu Kuai b5a603a720 null_blk: fix null-ptr-dereference while configuring 'power' and 'submit_queues'
[ Upstream commit a2db328b0839312c169eb42746ec46fc1ab53ed2 ]

Fix CVE: CVE-2024-36478

Writing 'power' and 'submit_queues' concurrently will trigger kernel
panic:

Test script:

modprobe null_blk nr_devices=0
mkdir -p /sys/kernel/config/nullb/nullb0
while true; do echo 1 > submit_queues; echo 4 > submit_queues; done &
while true; do echo 1 > power; echo 0 > power; done

Test result:

BUG: kernel NULL pointer dereference, address: 0000000000000148
Oops: 0000 [#1] PREEMPT SMP
RIP: 0010:__lock_acquire+0x41d/0x28f0
Call Trace:
 <TASK>
 lock_acquire+0x121/0x450
 down_write+0x5f/0x1d0
 simple_recursive_removal+0x12f/0x5c0
 blk_mq_debugfs_unregister_hctxs+0x7c/0x100
 blk_mq_update_nr_hw_queues+0x4a3/0x720
 nullb_update_nr_hw_queues+0x71/0xf0 [null_blk]
 nullb_device_submit_queues_store+0x79/0xf0 [null_blk]
 configfs_write_iter+0x119/0x1e0
 vfs_write+0x326/0x730
 ksys_write+0x74/0x150

This is because del_gendisk() can concurrent with
blk_mq_update_nr_hw_queues():

nullb_device_power_store	nullb_apply_submit_queues
 null_del_dev
 del_gendisk
				 nullb_update_nr_hw_queues
				  if (!dev->nullb)
				  // still set while gendisk is deleted
				   return 0
				  blk_mq_update_nr_hw_queues
 dev->nullb = NULL

Fix this problem by resuing the global mutex to protect
nullb_device_power_store() and nullb_update_nr_hw_queues() from configfs.

Fixes: 45919fbfe1 ("null_blk: Enable modifying 'submit_queues' after an instance has been configured")
Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com>
Closes: https://lore.kernel.org/all/CAHj4cs9LgsHLnjg8z06LQ3Pr5cax-+Ps+xT7AP7TPnEjStuwZA@mail.gmail.com/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://lore.kernel.org/r/20240523153934.1937851-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Haisu Wang <haisuwang@tencent.com>
2024-08-23 11:24:47 +00:00