OpenCloudOS-Kernel/Documentation
Sean Christopherson 3af4a9e61e KVM: x86: Serialize vendor module initialization (hardware setup)
Acquire a new mutex, vendor_module_lock, in kvm_x86_vendor_init() while
doing hardware setup to ensure that concurrent calls are fully serialized.
KVM rejects attempts to load vendor modules if a different module has
already been loaded, but doesn't handle the case where multiple vendor
modules are loaded at the same time, and module_init() doesn't run under
the global module_mutex.

Note, in practice, this is likely a benign bug as no platform exists that
supports both SVM and VMX, i.e. barring a weird VM setup, one of the
vendor modules is guaranteed to fail a support check before modifying
common KVM state.

Alternatively, KVM could perform an atomic CMPXCHG on .hardware_enable,
but that comes with its own ugliness as it would require setting
.hardware_enable before success is guaranteed, e.g. attempting to load
the "wrong" could result in spurious failure to load the "right" module.

Introduce a new mutex as using kvm_lock is extremely deadlock prone due
to kvm_lock being taken under cpus_write_lock(), and in the future, under
under cpus_read_lock().  Any operation that takes cpus_read_lock() while
holding kvm_lock would potentially deadlock, e.g. kvm_timer_init() takes
cpus_read_lock() to register a callback.  In theory, KVM could avoid
such problematic paths, i.e. do less setup under kvm_lock, but avoiding
all calls to cpus_read_lock() is subtly difficult and thus fragile.  E.g.
updating static calls also acquires cpus_read_lock().

Inverting the lock ordering, i.e. always taking kvm_lock outside
cpus_read_lock(), is not a viable option as kvm_lock is taken in various
callbacks that may be invoked under cpus_read_lock(), e.g. x86's
kvmclock_cpufreq_notifier().

The lockdep splat below is dependent on future patches to take
cpus_read_lock() in hardware_enable_all(), but as above, deadlock is
already is already possible.

  ======================================================
  WARNING: possible circular locking dependency detected
  6.0.0-smp--7ec93244f194-init2 #27 Tainted: G           O
  ------------------------------------------------------
  stable/251833 is trying to acquire lock:
  ffffffffc097ea28 (kvm_lock){+.+.}-{3:3}, at: hardware_enable_all+0x1f/0xc0 [kvm]

               but task is already holding lock:
  ffffffffa2456828 (cpu_hotplug_lock){++++}-{0:0}, at: hardware_enable_all+0xf/0xc0 [kvm]

               which lock already depends on the new lock.

               the existing dependency chain (in reverse order) is:

               -> #1 (cpu_hotplug_lock){++++}-{0:0}:
         cpus_read_lock+0x2a/0xa0
         __cpuhp_setup_state+0x2b/0x60
         __kvm_x86_vendor_init+0x16a/0x1870 [kvm]
         kvm_x86_vendor_init+0x23/0x40 [kvm]
         0xffffffffc0a4d02b
         do_one_initcall+0x110/0x200
         do_init_module+0x4f/0x250
         load_module+0x1730/0x18f0
         __se_sys_finit_module+0xca/0x100
         __x64_sys_finit_module+0x1d/0x20
         do_syscall_64+0x3d/0x80
         entry_SYSCALL_64_after_hwframe+0x63/0xcd

               -> #0 (kvm_lock){+.+.}-{3:3}:
         __lock_acquire+0x16f4/0x30d0
         lock_acquire+0xb2/0x190
         __mutex_lock+0x98/0x6f0
         mutex_lock_nested+0x1b/0x20
         hardware_enable_all+0x1f/0xc0 [kvm]
         kvm_dev_ioctl+0x45e/0x930 [kvm]
         __se_sys_ioctl+0x77/0xc0
         __x64_sys_ioctl+0x1d/0x20
         do_syscall_64+0x3d/0x80
         entry_SYSCALL_64_after_hwframe+0x63/0xcd

               other info that might help us debug this:

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(cpu_hotplug_lock);
                                 lock(kvm_lock);
                                 lock(cpu_hotplug_lock);
    lock(kvm_lock);

                *** DEADLOCK ***

  1 lock held by stable/251833:
   #0: ffffffffa2456828 (cpu_hotplug_lock){++++}-{0:0}, at: hardware_enable_all+0xf/0xc0 [kvm]

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221130230934.1014142-16-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:41:03 -05:00
..
ABI kernel hardening fixes for v6.2-rc1 2022-12-23 12:00:24 -08:00
PCI cxl for 6.2 2022-12-12 13:55:31 -08:00
RCU Updates for timers, timekeeping and drivers: 2022-12-12 12:52:02 -08:00
accel doc: add documentation for accel subsystem 2022-11-22 13:14:52 +02:00
accounting filemap: make the accounting of thrashing more consistent 2022-09-26 19:46:06 -07:00
admin-guide IOMMU Updates for Linux v6.2 2022-12-19 08:34:39 -06:00
arc
arm Documentation: arm: marvell: Add Orion codenames and archive homepage 2022-11-01 17:13:03 -06:00
arm64 Revert "arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruption" 2022-12-15 17:59:12 +00:00
block blk-crypto: don't use struct request_queue for public interfaces 2022-11-21 11:39:05 -07:00
bpf docs/bpf: Reword docs for BPF_MAP_TYPE_SK_STORAGE 2022-12-14 18:35:41 +01:00
cdrom
core-api hardening updates for v6.2-rc1 2022-12-14 12:20:00 -08:00
cpu-freq cpufreq: Remove CVS version control contents from documentation 2022-12-06 12:24:51 +01:00
crypto crypto: doc - use correct function name 2022-11-04 17:35:44 +08:00
dev-tools linux-kselftest-kunit-next-6.2-rc1 2022-12-12 16:42:57 -08:00
devicetree More sound updates for 6.2-rc1 2022-12-23 11:15:48 -08:00
doc-guide Merge branch 'alabaster-rb' into docs-mw 2022-10-18 16:29:50 -06:00
driver-api dmaengine updates for v6.2 2022-12-19 08:54:17 -06:00
fault-injection debugfs: fix error when writing negative value to atomic_t debugfs file 2022-11-30 16:13:16 -08:00
fb docs/fb: Document current named modes 2022-11-15 10:07:40 +01:00
features RISC-V Patches for the 6.2 Merge Window, Part 1 2022-12-14 15:23:49 -08:00
filesystems ntfs3 for 6.2 2022-12-21 10:18:17 -08:00
firmware-guide Merge branches 'acpi-misc', 'acpi-tools' and 'acpi-docs' 2022-10-03 20:03:49 +02:00
firmware_class
fpga
gpu drm/amdgpu: add GART, GPUVM, and GTT to glossary 2022-12-02 10:05:33 -05:00
hid
hwmon hwmon: (aquacomputer_d5next) Add support for Quadro flow sensor pulses 2022-12-04 16:45:03 -08:00
i2c docs: i2c: slave-interface: return errno when handle I2C_SLAVE_WRITE_REQUESTED 2022-09-28 21:41:59 +02:00
ia64 docs: ia64: Fix a typo ("identify mappings") 2022-11-09 14:03:51 -07:00
iio docs: iio: add documentation for BNO055 driver 2022-09-21 18:42:56 +01:00
images
infiniband
input Merge branch 'next' into for-linus 2022-10-09 22:30:23 -07:00
isdn
kbuild Documentation: kbuild: Add description of git for reproducible builds 2022-10-28 00:16:29 +09:00
kernel-hacking Updates for timers, timekeeping and drivers: 2022-12-12 12:52:02 -08:00
leds
litmus-tests
livepatch
locking Remove duplicate words inside documentation 2022-09-27 13:21:43 -06:00
loongarch This was a not-too-busy cycle for documentation; highlights include: 2022-12-12 17:18:50 -08:00
m68k
maintainer
mhi
mips
misc-devices
mm MM patches for 6.2-rc1. 2022-12-13 19:29:45 -08:00
netlabel
networking Documentation: devlink: add missing toc entry for etas_es58x devlink doc 2022-12-19 16:08:27 +01:00
nios2
nvdimm
openrisc
parisc
pcmcia
peci
power
powerpc docs: powerpc: add POWER9 and POWER10 to CPU families 2022-11-24 23:31:47 +11:00
process Kbuild updates for v6.2 2022-12-19 12:33:32 -06:00
riscv RISC-V Patches for the 6.2 Merge Window, Part 1 2022-12-14 15:23:49 -08:00
rust x86: enable initial Rust support 2022-09-28 09:02:45 +02:00
s390 vfio/mdev: embedd struct mdev_parent in the parent data structure 2022-10-04 12:06:58 -06:00
scheduler docs: scheduler: Update new path for the sysctl knobs 2022-09-27 13:21:42 -06:00
scsi scsi: core: Change the return type of .eh_timed_out() 2022-10-22 03:25:59 +00:00
security KEYS: encrypted: fix key instantiation with user-provided data 2022-10-19 13:01:23 -04:00
sh
sound
sparc
sphinx docs: sphinx-pre-install: don't require the RTD theme 2022-10-13 11:14:43 -06:00
sphinx-static docs: Don't wire font sizes for HTML output 2022-11-01 15:59:40 -06:00
spi
staging docs: put atomic*.txt and memory-barriers.txt into the core-api book 2022-09-29 12:55:06 -06:00
target
timers Documentation: Replace del_timer/del_timer_sync() 2022-11-24 15:09:11 +01:00
tools Documentation/rv: Add verification/rv man pages 2022-12-09 18:06:24 -05:00
trace Trace probes updates for 6.2: 2022-12-21 18:57:24 -08:00
translations This was a not-too-busy cycle for documentation; highlights include: 2022-12-12 17:18:50 -08:00
usb Documentation: USB: correct possessive "its" usage 2022-11-21 14:33:23 -07:00
userspace-api iommufd for 6.2 2022-12-14 09:15:43 -08:00
virt KVM: x86: Serialize vendor module initialization (hardware setup) 2022-12-29 15:41:03 -05:00
w1 Documentation: W1: minor typo corrections 2022-09-27 13:21:44 -06:00
watchdog
x86 Add TDX guest attestation infrastructure and driver 2022-12-12 14:27:49 -08:00
xtensa
.gitignore
Changes
CodingStyle
Kconfig
Makefile doc: add texinfodocs and infodocs targets 2022-11-21 14:13:57 -07:00
SubmittingPatches
arch.rst
atomic_bitops.txt
atomic_t.txt
conf.py docs/sphinx: More depth in the rtd sidebar toc 2022-11-09 13:28:40 -07:00
docutils.conf
dontdiff
index.rst Rust introduction for v6.1-rc1 2022-10-03 16:39:37 -07:00
memory-barriers.txt docs/memory-barriers.txt: Add a missed closing parenthesis 2022-10-18 15:14:52 -07:00
subsystem-apis.rst doc: add documentation for accel subsystem 2022-11-22 13:14:52 +02:00