The vm_param and cpu_regs need to be freed via kfree()
before return -EINVAL error.
Fixes: 9c5137aedd ("virt: acrn: Introduce VM management interfaces")
Fixes: 2ad2aaee1b ("virt: acrn: Introduce an ioctl to set vCPU registers state")
Signed-off-by: Xiaolong Huang <butterflyhuangxx@gmail.com>
Signed-off-by: Fei Li <fei1.li@intel.com>
Link: https://lore.kernel.org/r/20220308092047.1008409-1-butterflyhuangxx@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
acrn_vm_ram_map can't pin the user pages with VM_PFNMAP flag
by calling get_user_pages_fast(), the PA(physical pages)
may be mapped by kernel driver and set PFNMAP flag.
This patch fixes logic to setup EPT mapping for PFN mapped RAM region
by checking the memory attribute before adding EPT mapping for them.
Fixes: 88f537d5e8 ("virt: acrn: Introduce EPT mapping management")
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Signed-off-by: Fei Li <fei1.li@intel.com>
Link: https://lore.kernel.org/r/20220228022212.419406-1-yonghua.huang@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
acrn_irqfds_mutex is not used, never was.
Remove acrn_irqfds_mutex.
Fixes: aa3b483ff1 ("virt: acrn: Introduce irqfd")
Cc: Fei Li <fei1.li@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/YidLo57Kw/u/cpA5@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
VM Generation ID is a feature from Microsoft, described at
<https://go.microsoft.com/fwlink/?LinkId=260709>, and supported by
Hyper-V and QEMU. Its usage is described in Microsoft's RNG whitepaper,
<https://aka.ms/win10rng>, as:
If the OS is running in a VM, there is a problem that most
hypervisors can snapshot the state of the machine and later rewind
the VM state to the saved state. This results in the machine running
a second time with the exact same RNG state, which leads to serious
security problems. To reduce the window of vulnerability, Windows
10 on a Hyper-V VM will detect when the VM state is reset, retrieve
a unique (not random) value from the hypervisor, and reseed the root
RNG with that unique value. This does not eliminate the
vulnerability, but it greatly reduces the time during which the RNG
system will produce the same outputs as it did during a previous
instantiation of the same VM state.
Linux has the same issue, and given that vmgenid is supported already by
multiple hypervisors, we can implement more or less the same solution.
So this commit wires up the vmgenid ACPI notification to the RNG's newly
added add_vmfork_randomness() function.
It can be used from qemu via the `-device vmgenid,guid=auto` parameter.
After setting that, use `savevm` in the monitor to save the VM state,
then quit QEMU, start it again, and use `loadvm`. That will trigger this
driver's notify function, which hands the new UUID to the RNG. This is
described in <https://git.qemu.org/?p=qemu.git;a=blob;f=docs/specs/vmgenid.txt>.
And there are hooks for this in libvirt as well, described in
<https://libvirt.org/formatdomain.html#general-metadata>.
Note, however, that the treatment of this as a UUID is considered to be
an accidental QEMU nuance, per
<https://github.com/libguestfs/virt-v2v/blob/master/docs/vm-generation-id-across-hypervisors.txt>,
so this driver simply treats these bytes as an opaque 128-bit binary
blob, as per the spec. This doesn't really make a difference anyway,
considering that's how it ends up when handed to the RNG in the end.
Cc: Alexander Graf <graf@amazon.com>
Cc: Adrian Catangiu <adrian@parity.io>
Cc: Daniel P. Berrangé <berrange@redhat.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Wei Yongjun <weiyongjun1@huawei.com>
Tested-by: Souradeep Chakrabarti <souradch.linux@gmail.com> # With Hyper-V's virtual hardware
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
-----BEGIN PGP SIGNATURE-----
iQHJBAABCgAzFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmHi+xgVHHl1cnkubm9y
b3ZAZ21haWwuY29tAAoJELFEgP06H77IxdoMAMf3E+L51Ys/4iAiyJQNVoT3aIBC
A8ZVOB9he1OA3o3wBNIRKmICHk+ovnfCWcXTr9fG/Ade2wJz88NAsGPQ1Phywb+s
iGlpySllFN72RT9ZqtJhLEzgoHHOL0CzTW07TN9GJy4gQA2h2G9CTP+OmsQdnVqE
m9Fn3PSlJ5lhzePlKfnln8rGZFgrriJakfEFPC79n/7an4+2Hvkb5rWigo7KQc4Z
9YNqYUcHWZFUgq80adxEb9LlbMXdD+Z/8fCjOrAatuwVkD4RDt6iKD0mFGjHXGL7
MZ9KRS8AfZXawmetk3jjtsV+/QkeS+Deuu7k0FoO0Th2QV7BGSDhsLXAS5By/MOC
nfSyHhnXHzCsBMyVNrJHmNhEZoN29+tRwI84JX9lWcf/OLANcCofnP6f2UIX7tZY
CAZAgVELp+0YQXdybrfzTQ8BT3TinjS/aZtCrYijRendI1GwUXcyl69vdOKqAHuk
5jy8k/xHyp+ZWu6v+PyAAAEGowY++qhL0fmszA==
=RKW4
-----END PGP SIGNATURE-----
Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linux
Pull bitmap updates from Yury Norov:
- introduce for_each_set_bitrange()
- use find_first_*_bit() instead of find_next_*_bit() where possible
- unify for_each_bit() macros
* tag 'bitmap-5.17-rc1' of git://github.com/norov/linux:
vsprintf: rework bitmap_list_string
lib: bitmap: add performance test for bitmap_print_to_pagebuf
bitmap: unify find_bit operations
mm/percpu: micro-optimize pcpu_is_populated()
Replace for_each_*_bit_from() with for_each_*_bit() where appropriate
find: micro-optimize for_each_{set,clear}_bit()
include/linux: move for_each_bit() macros from bitops.h to find.h
cpumask: replace cpumask_next_* with cpumask_first_* where appropriate
tools: sync tools/bitmap with mother linux
all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
cpumask: use find_first_and_bit()
lib: add find_first_and_bit()
arch: remove GENERIC_FIND_FIRST_BIT entirely
include: move find.h from asm_generic to linux
bitops: move find_bit_*_le functions from le.h to find.h
bitops: protect find_first_{,zero}_bit properly
find_first{,_zero}_bit is a more effective analogue of 'next' version if
start == 0. This patch replaces 'next' with 'first' where things look
trivial.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Add KUnit tests for the contiguous physical memory regions merging
functionality from the Nitro Enclaves misc device logic.
We can build the test binary with the following configuration:
CONFIG_KUNIT=y
CONFIG_NITRO_ENCLAVES=m
CONFIG_NITRO_ENCLAVES_MISC_DEV_TEST=y
and install the nitro_enclaves module to run the testcases.
We'll see the following message using dmesg if everything goes well:
[...] # Subtest: ne_misc_dev_test
[...] 1..1
[...] (NULL device *): Physical mem region address is not 2 MiB aligned
[...] (NULL device *): Physical mem region size is not multiple of 2 MiB
[...] (NULL device *): Physical mem region address is not 2 MiB aligned
[...] ok 1 - ne_misc_dev_test_merge_phys_contig_memory_regions
[...] ok 1 - ne_misc_dev_test
Reviewed-by: Andra Paraschiv <andraprs@amazon.com>
Signed-off-by: Longpeng <longpeng2@huawei.com>
Link: https://lore.kernel.org/r/20211107140918.2106-5-longpeng2@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add the initial setup for the KUnit tests that will target the Nitro
Enclaves misc device functionality.
Reviewed-by: Andra Paraschiv <andraprs@amazon.com>
Signed-off-by: Longpeng <longpeng2@huawei.com>
Link: https://lore.kernel.org/r/20211107140918.2106-4-longpeng2@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sanity check the physical memory regions during the merge of contiguous
regions. Thus we can test the physical memory regions setup logic
individually, including the error cases coming from the sanity checks.
Reviewed-by: Andra Paraschiv <andraprs@amazon.com>
Signed-off-by: Longpeng <longpeng2@huawei.com>
Link: https://lore.kernel.org/r/20211107140918.2106-3-longpeng2@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There can be cases when there are more memory regions that need to be
set for an enclave than the maximum supported number of memory regions
per enclave. One example can be when the memory regions are backed by 2
MiB hugepages (the minimum supported hugepage size).
Let's merge the adjacent regions if they are physically contiguous. This
way the final number of memory regions is less than before merging and
could potentially avoid reaching maximum.
Reviewed-by: Andra Paraschiv <andraprs@amazon.com>
Signed-off-by: Longpeng <longpeng2@huawei.com>
Link: https://lore.kernel.org/r/20211107140918.2106-2-longpeng2@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
'destroy_workqueue()' already drains the queue before destroying it, so
there is no need to flush it explicitly.
Remove the redundant 'flush_workqueue()' calls.
This was generated with coccinelle:
@@
expression E;
@@
- flush_workqueue(E);
destroy_workqueue(E);
Reviewed-by: Andra Paraschiv <andraprs@amazon.com>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/d57f5c7e362837a8dfcde0d726a76b56f114e619.1636736947.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ACRN hypervisor can emulate a virtual device within hypervisor for a
Guest VM. The emulated virtual device can work without the ACRN
userspace after creation. The hypervisor do the emulation of that device.
To support the virtual device creating/destroying, HSM provides the
following ioctls:
- ACRN_IOCTL_CREATE_VDEV
Pass data struct acrn_vdev from userspace to the hypervisor, and inform
the hypervisor to create a virtual device for a User VM.
- ACRN_IOCTL_DESTROY_VDEV
Pass data struct acrn_vdev from userspace to the hypervisor, and inform
the hypervisor to destroy a virtual device of a User VM.
These new APIs will be used by user space code vm_add_hv_vdev and
vm_remove_hv_vdev in
https://github.com/projectacrn/acrn-hypervisor/blob/master/devicemodel/core/vmmapi.c
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Signed-off-by: Fei Li <fei1.li@intel.com>
Link: https://lore.kernel.org/r/20210923084128.18902-3-fei1.li@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
MMIO device passthrough enables an OS in a virtual machine to directly
access a MMIO device in the host. It promises almost the native
performance, which is required in performance-critical scenarios of
ACRN.
HSM provides the following ioctls:
- Assign - ACRN_IOCTL_ASSIGN_MMIODEV
Pass data struct acrn_mmiodev from userspace to the hypervisor, and
inform the hypervisor to assign a MMIO device to a User VM.
- De-assign - ACRN_IOCTL_DEASSIGN_PCIDEV
Pass data struct acrn_mmiodev from userspace to the hypervisor, and
inform the hypervisor to de-assign a MMIO device from a User VM.
These new APIs will be used by user space code vm_assign_mmiodev and
vm_deassign_mmiodev in
https://github.com/projectacrn/acrn-hypervisor/blob/master/devicemodel/core/vmmapi.c
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Signed-off-by: Fei Li <fei1.li@intel.com>
Link: https://lore.kernel.org/r/20210923084128.18902-2-fei1.li@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Update the codebase formatting to fix the reports from the checkpatch
script, to match the open parenthesis.
Reviewed-by: George-Aurelian Popescu <popegeo@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Link: https://lore.kernel.org/r/20210827154930.40608-6-andraprs@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Update the copyright statement to include 2021, as a change has been
made over this year.
Check commit d874742f6a ("nitro_enclaves: Set Bus Master for the NE
PCI device") for the codebase update from this file (ne_pci_dev.c).
Reviewed-by: George-Aurelian Popescu <popegeo@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Link: https://lore.kernel.org/r/20210827154930.40608-5-andraprs@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix the reported issue from the kernel-doc script, to have a comment per
identifier.
Reviewed-by: George-Aurelian Popescu <popegeo@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Link: https://lore.kernel.org/r/20210827154930.40608-4-andraprs@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ACRN hypervisor has scenarios which could run a real-time guest VM.
The real-time guest VM occupies dedicated CPU cores, be assigned with
dedicated PCI devices. It can run without the Service VM after boot up.
hcall_destroy_vm() returns failure when a real-time guest VM refuses.
The clearing of flag ACRN_VM_FLAG_DESTROYED causes some kernel resource
double-freed in a later acrn_vm_destroy().
Do hcall_destroy_vm() before resource release to drop this chance to
destroy the VM if hypercall fails.
Fixes: 9c5137aedd ("virt: acrn: Introduce VM management interfaces")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Signed-off-by: Fei Li <fei1.li@intel.com>
Link: https://lore.kernel.org/r/20210722062736.15050-1-fei1.li@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Enable Bus Master for the NE PCI device, according to the PCI spec
for submitting memory or I/O requests:
Master Enable – Controls the ability of a PCI Express
Endpoint to issue Memory and I/O Read/Write Requests, and
the ability of a Root or Switch Port to forward Memory and
I/O Read/Write Requests in the Upstream direction
Cc: Andra Paraschiv <andraprs@amazon.com>
Cc: Alexandru Vasile <lexnv@amazon.com>
Cc: Alexandru Ciobotaru <alcioa@amazon.com>
Reviewed-by: Andra Paraschiv <andraprs@amazon.com>
Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Link: https://lore.kernel.org/r/20210621004046.1419-1-longpeng2@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A failing usercopy of the slot uid will lead to a stale entry in the
file descriptor table as put_unused_fd() won't release it. This enables
userland to refer to a dangling 'file' object through that still valid
file descriptor, leading to all kinds of use-after-free exploitation
scenarios.
Exchanging put_unused_fd() for close_fd(), ksys_close() or alike won't
solve the underlying issue, as the file descriptor might have been
replaced in the meantime, e.g. via userland calling close() on it
(leading to a NULL pointer dereference in the error handling code as
'fget(enclave_fd)' will return a NULL pointer) or by dup2()'ing a
completely different file object to that very file descriptor, leading
to the same situation: a dangling file descriptor pointing to a freed
object -- just in this case to a file object of user's choosing.
Generally speaking, after the call to fd_install() the file descriptor
is live and userland is free to do whatever with it. We cannot rely on
it to still refer to our enclave object afterwards. In fact, by abusing
userfaultfd() userland can hit the condition without any racing and
abuse the error handling in the nitro code as it pleases.
To fix the above issues, defer the call to fd_install() until all
possible errors are handled. In this case it's just the usercopy, so do
it directly in ne_create_vm_ioctl() itself.
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210429165941.27020-2-andraprs@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes the following sparse warning:
"sparse warnings: (new ones prefixed by >>)"
>> drivers/virt/acrn/irqfd.c:163:13: sparse: sparse: restricted __poll_t
degrades to integer
Fixes: dcf9625f2a ("virt: acrn: Use vfs_poll() instead of f_op->poll()")
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Shuo Liu <shuo.a.liu@intel.com>
Signed-off-by: Yejune Deng <yejune.deng@gmail.com>
Link: https://lore.kernel.org/r/20210310074901.7486-1-yejune.deng@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use a more advanced function vfs_poll() in acrn_irqfd_assign().
At the same time, modify the definition of events.
Signed-off-by: Yejune Deng <yejune.deng@gmail.com>
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Link: https://lore.kernel.org/r/20210221133306.33530-1-shuo.a.liu@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Without cpu hotplug support, vCPU cannot be removed from a Service VM.
Don't expose remove_cpu sysfs when CONFIG_HOTPLUG_CPU disabled.
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Qais Yousef <qais.yousef@arm.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Link: https://lore.kernel.org/r/20210221134339.57851-2-shuo.a.liu@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ACRN supports partition mode to achieve real-time requirements. In
partition mode, a CPU core can be dedicated to a vCPU of User VM. The
local APIC of the dedicated CPU core can be passthrough to the User VM.
The Service VM controls the assignment of the CPU cores.
Introduce an interface for the Service VM to remove the control of CPU
core from hypervisor perspective so that the CPU core can be a dedicated
CPU core of User VM.
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Yu Wang <yu1.wang@intel.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Link: https://lore.kernel.org/r/20210207031040.49576-18-shuo.a.liu@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
irqfd is a mechanism to inject a specific interrupt to a User VM using a
decoupled eventfd mechanism.
Vhost is a kernel-level virtio server which uses eventfd for interrupt
injection. To support vhost on ACRN, irqfd is introduced in HSM.
HSM provides ioctls to associate a virtual Message Signaled Interrupt
(MSI) with an eventfd. The corresponding virtual MSI will be injected
into a User VM once the eventfd got signal.
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Yu Wang <yu1.wang@intel.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Link: https://lore.kernel.org/r/20210207031040.49576-17-shuo.a.liu@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ioeventfd is a mechanism to register PIO/MMIO regions to trigger an
eventfd signal when written to by a User VM. ACRN userspace can register
any arbitrary I/O address with a corresponding eventfd and then pass the
eventfd to a specific end-point of interest for handling.
Vhost is a kernel-level virtio server which uses eventfd for signalling.
To support vhost on ACRN, ioeventfd is introduced in HSM.
A new I/O client dedicated to ioeventfd is associated with a User VM
during VM creation. HSM provides ioctls to associate an I/O region with
a eventfd. The I/O client signals a eventfd once its corresponding I/O
region is matched with an I/O request.
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Yu Wang <yu1.wang@intel.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Link: https://lore.kernel.org/r/20210207031040.49576-16-shuo.a.liu@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
An I/O request of a User VM, which is constructed by hypervisor, is
distributed by the ACRN Hypervisor Service Module to an I/O client
corresponding to the address range of the I/O request.
I/O client maintains a list of address ranges. Introduce
acrn_ioreq_range_{add,del}() to manage these address ranges.
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Yu Wang <yu1.wang@intel.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Link: https://lore.kernel.org/r/20210207031040.49576-15-shuo.a.liu@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The C-states and P-states data are used to support CPU power management.
The hypervisor controls C-states and P-states for a User VM.
ACRN userspace need to query the data from the hypervisor to build ACPI
tables for a User VM.
HSM provides ioctls for ACRN userspace to query C-states and P-states
data obtained from the hypervisor.
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Yu Wang <yu1.wang@intel.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Link: https://lore.kernel.org/r/20210207031040.49576-14-shuo.a.liu@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ACRN userspace need to inject virtual interrupts into a User VM in
devices emulation.
HSM needs provide interfaces to do so.
Introduce following interrupt injection interfaces:
ioctl ACRN_IOCTL_SET_IRQLINE:
Pass data from userspace to the hypervisor, and inform the hypervisor
to inject a virtual IOAPIC GSI interrupt to a User VM.
ioctl ACRN_IOCTL_INJECT_MSI:
Pass data struct acrn_msi_entry from userspace to the hypervisor, and
inform the hypervisor to inject a virtual MSI to a User VM.
ioctl ACRN_IOCTL_VM_INTR_MONITOR:
Set a 4-Kbyte aligned shared page for statistics information of
interrupts of a User VM.
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Yu Wang <yu1.wang@intel.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Link: https://lore.kernel.org/r/20210207031040.49576-13-shuo.a.liu@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
PCI device passthrough enables an OS in a virtual machine to directly
access a PCI device in the host. It promises almost the native
performance, which is required in performance-critical scenarios of
ACRN.
HSM provides the following ioctls:
- Assign - ACRN_IOCTL_ASSIGN_PCIDEV
Pass data struct acrn_pcidev from userspace to the hypervisor, and
inform the hypervisor to assign a PCI device to a User VM.
- De-assign - ACRN_IOCTL_DEASSIGN_PCIDEV
Pass data struct acrn_pcidev from userspace to the hypervisor, and
inform the hypervisor to de-assign a PCI device from a User VM.
- Set a interrupt of a passthrough device - ACRN_IOCTL_SET_PTDEV_INTR
Pass data struct acrn_ptdev_irq from userspace to the hypervisor,
and inform the hypervisor to map a INTx interrupt of passthrough
device of User VM.
- Reset passthrough device interrupt - ACRN_IOCTL_RESET_PTDEV_INTR
Pass data struct acrn_ptdev_irq from userspace to the hypervisor,
and inform the hypervisor to unmap a INTx interrupt of passthrough
device of User VM.
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Yu Wang <yu1.wang@intel.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Link: https://lore.kernel.org/r/20210207031040.49576-12-shuo.a.liu@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A User VM can access its virtual PCI configuration spaces via port IO
approach, which has two following steps:
1) writes address into port 0xCF8
2) put/get data in/from port 0xCFC
To distribute a complete PCI configuration space access one time, HSM
need to combine such two accesses together.
Combine two paired PIO I/O requests into one PCI I/O request and
continue the I/O request distribution.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Link: https://lore.kernel.org/r/20210207031040.49576-11-shuo.a.liu@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
An I/O request of a User VM, which is constructed by the hypervisor, is
distributed by the ACRN Hypervisor Service Module to an I/O client
corresponding to the address range of the I/O request.
For each User VM, there is a shared 4-KByte memory region used for I/O
requests communication between the hypervisor and Service VM. An I/O
request is a 256-byte structure buffer, which is 'struct
acrn_io_request', that is filled by an I/O handler of the hypervisor
when a trapped I/O access happens in a User VM. ACRN userspace in the
Service VM first allocates a 4-KByte page and passes the GPA (Guest
Physical Address) of the buffer to the hypervisor. The buffer is used as
an array of 16 I/O request slots with each I/O request slot being 256
bytes. This array is indexed by vCPU ID.
An I/O client, which is 'struct acrn_ioreq_client', is responsible for
handling User VM I/O requests whose accessed GPA falls in a certain
range. Multiple I/O clients can be associated with each User VM. There
is a special client associated with each User VM, called the default
client, that handles all I/O requests that do not fit into the range of
any other I/O clients. The ACRN userspace acts as the default client for
each User VM.
The state transitions of a ACRN I/O request are as follows.
FREE -> PENDING -> PROCESSING -> COMPLETE -> FREE -> ...
FREE: this I/O request slot is empty
PENDING: a valid I/O request is pending in this slot
PROCESSING: the I/O request is being processed
COMPLETE: the I/O request has been processed
An I/O request in COMPLETE or FREE state is owned by the hypervisor. HSM
and ACRN userspace are in charge of processing the others.
The processing flow of I/O requests are listed as following:
a) The I/O handler of the hypervisor will fill an I/O request with
PENDING state when a trapped I/O access happens in a User VM.
b) The hypervisor makes an upcall, which is a notification interrupt, to
the Service VM.
c) The upcall handler schedules a worker to dispatch I/O requests.
d) The worker looks for the PENDING I/O requests, assigns them to
different registered clients based on the address of the I/O accesses,
updates their state to PROCESSING, and notifies the corresponding
client to handle.
e) The notified client handles the assigned I/O requests.
f) The HSM updates I/O requests states to COMPLETE and notifies the
hypervisor of the completion via hypercalls.
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Yu Wang <yu1.wang@intel.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Link: https://lore.kernel.org/r/20210207031040.49576-10-shuo.a.liu@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The HSM provides hypervisor services to the ACRN userspace. While
launching a User VM, ACRN userspace needs to allocate memory and request
the ACRN Hypervisor to set up the EPT mapping for the VM.
A mapping cache is introduced for accelerating the translation between
the Service VM kernel virtual address and User VM physical address.
>From the perspective of the hypervisor, the types of GPA of User VM can be
listed as following:
1) RAM region, which is used by User VM as system ram.
2) MMIO region, which is recognized by User VM as MMIO. MMIO region is
used to be utilized for devices emulation.
Generally, User VM RAM regions mapping is set up before VM started and
is released in the User VM destruction. MMIO regions mapping may be set
and unset dynamically during User VM running.
To achieve this, ioctls ACRN_IOCTL_SET_MEMSEG and ACRN_IOCTL_UNSET_MEMSEG
are introduced in HSM.
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Yu Wang <yu1.wang@intel.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Link: https://lore.kernel.org/r/20210207031040.49576-9-shuo.a.liu@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A virtual CPU of User VM has different context due to the different
registers state. ACRN userspace needs to set the virtual CPU
registers state (e.g. giving a initial registers state to a virtual
BSP of a User VM).
HSM provides an ioctl ACRN_IOCTL_SET_VCPU_REGS to do the virtual CPU
registers state setting. The ioctl passes the registers state from ACRN
userspace to the hypervisor directly.
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Yu Wang <yu1.wang@intel.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Link: https://lore.kernel.org/r/20210207031040.49576-8-shuo.a.liu@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The VM management interfaces expose several VM operations to ACRN
userspace via ioctls. For example, creating VM, starting VM, destroying
VM and so on.
The ACRN Hypervisor needs to exchange data with the ACRN userspace
during the VM operations. HSM provides VM operation ioctls to the ACRN
userspace and communicates with the ACRN Hypervisor for VM operations
via hypercalls.
HSM maintains a list of User VM. Each User VM will be bound to an
existing file descriptor of /dev/acrn_hsm. The User VM will be
destroyed when the file descriptor is closed.
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Yu Wang <yu1.wang@intel.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Link: https://lore.kernel.org/r/20210207031040.49576-7-shuo.a.liu@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ACRN Hypervisor Service Module (HSM) is a kernel module in Service VM
which communicates with ACRN userspace through ioctls and talks to ACRN
Hypervisor through hypercalls.
Add a basic HSM driver which allows Service VM userspace to communicate
with ACRN. The following patches will add more ioctls, guest VM memory
mapping caching, I/O request processing, ioeventfd and irqfd into this
module. HSM exports a char device interface (/dev/acrn_hsm) to userspace.
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Yu Wang <yu1.wang@intel.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Link: https://lore.kernel.org/r/20210207031040.49576-6-shuo.a.liu@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Do not use wait_event_interruptible when vbg_hgcm_call() gets called from
kernel-context, such as it being called by the vboxsf filesystem code.
This fixes some filesystem related system calls on shared folders
unexpectedly failing with -EINTR.
Fixes: 0532a1b0d0 ("virt: vbox: Implement passing requestor info to the host for VirtualBox 6.0.x")
Reported-by: Ludovic Pouzenc <bugreports@pouzenc.fr>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210121150754.147598-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Update the assigned value of the poll result to be EPOLLHUP instead of
POLLHUP to match the __poll_t type.
While at it, simplify the logic of setting the mask result of the poll
function.
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Link: https://lore.kernel.org/r/20201102173622.32169-1-andraprs@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add Makefile for the Nitro Enclaves driver, considering the option set
in the kernel config.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* Remove -Wall flags, could use W=1 as an option for this.
v7 -> v8
* No changes.
v6 -> v7
* No changes.
v5 -> v6
* No changes.
v4 -> v5
* No changes.
v3 -> v4
* No changes.
v2 -> v3
* Remove the GPL additional wording as SPDX-License-Identifier is
already in place.
v1 -> v2
* Update path to Makefile to match the drivers/virt/nitro_enclaves
directory.
Reviewed-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Link: https://lore.kernel.org/r/20200921121732.44291-16-andraprs@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add kernel config entry for Nitro Enclaves, including dependencies.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* No changes.
v7 -> v8
* No changes.
v6 -> v7
* Remove, for now, the dependency on ARM64 arch. x86 is currently
supported, with Arm to come afterwards. The NE kernel driver can be
built for aarch64 arch.
v5 -> v6
* No changes.
v4 -> v5
* Add arch dependency for Arm / x86.
v3 -> v4
* Add PCI and SMP dependencies.
v2 -> v3
* Remove the GPL additional wording as SPDX-License-Identifier is
already in place.
v1 -> v2
* Update path to Kconfig to match the drivers/virt/nitro_enclaves
directory.
* Update help in Kconfig.
Reviewed-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Link: https://lore.kernel.org/r/20200921121732.44291-15-andraprs@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
An enclave is associated with an fd that is returned after the enclave
creation logic is completed. This enclave fd is further used to setup
enclave resources. Once the enclave needs to be terminated, the enclave
fd is closed.
Add logic for enclave termination, that is mapped to the enclave fd
release callback. Free the internal enclave info used for bookkeeping.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* Use the ne_devs data structure to get the refs for the NE PCI device.
v7 -> v8
* No changes.
v6 -> v7
* Remove the pci_dev_put() call as the NE misc device parent field is
used now to get the NE PCI device.
* Update the naming and add more comments to make more clear the logic
of handling full CPU cores and dedicating them to the enclave.
v5 -> v6
* Update documentation to kernel-doc format.
* Use directly put_page() instead of unpin_user_pages(), to match the
get_user_pages() calls.
v4 -> v5
* Release the reference to the NE PCI device on enclave fd release.
* Adapt the logic to cpumask enclave vCPU ids and CPU cores.
* Remove sanity checks for situations that shouldn't happen, only if
buggy system or broken logic at all.
v3 -> v4
* Use dev_err instead of custom NE log pattern.
v2 -> v3
* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
v1 -> v2
* Add log pattern for NE.
* Remove the BUG_ON calls.
* Update goto labels to match their purpose.
* Add early exit in release() if there was a slot alloc error in the fd
creation path.
Reviewed-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Link: https://lore.kernel.org/r/20200921121732.44291-14-andraprs@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After all the enclave resources are set, the enclave is ready for
beginning to run.
Add ioctl command logic for starting an enclave after all its resources,
memory regions and CPUs, have been set.
The enclave start information includes the local channel addressing -
vsock CID - and the flags associated with the enclave.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* Use the ne_devs data structure to get the refs for the NE PCI device.
v7 -> v8
* Add check for invalid enclave CID value e.g. well-known CIDs and
parent VM CID.
* Add custom error code for incorrect flag in enclave start info and
invalid enclave CID.
v6 -> v7
* Update the naming and add more comments to make more clear the logic
of handling full CPU cores and dedicating them to the enclave.
v5 -> v6
* Check for invalid enclave start flags.
* Update documentation to kernel-doc format.
v4 -> v5
* Add early exit on enclave start ioctl function call error.
* Move sanity checks in the enclave start ioctl function, outside of the
switch-case block.
* Remove log on copy_from_user() / copy_to_user() failure.
v3 -> v4
* Use dev_err instead of custom NE log pattern.
* Update the naming for the ioctl command from metadata to info.
* Check for minimum enclave memory size.
v2 -> v3
* Remove the WARN_ON calls.
* Update static calls sanity checks.
v1 -> v2
* Add log pattern for NE.
* Check if enclave state is init when starting an enclave.
* Remove the BUG_ON calls.
Reviewed-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Link: https://lore.kernel.org/r/20200921121732.44291-13-andraprs@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Another resource that is being set for an enclave is memory. User space
memory regions, that need to be backed by contiguous memory regions,
are associated with the enclave.
One solution for allocating / reserving contiguous memory regions, that
is used for integration, is hugetlbfs. The user space process that is
associated with the enclave passes to the driver these memory regions.
The enclave memory regions need to be from the same NUMA node as the
enclave CPUs.
Add ioctl command logic for setting user space memory region for an
enclave.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* Use the ne_devs data structure to get the refs for the NE PCI device.
v7 -> v8
* Add early check, while getting user pages, to be multiple of 2 MiB for
the pages that back the user space memory region.
* Add custom error code for incorrect user space memory region flag.
* Include in a separate function the sanity checks for each page of the
user space memory region.
v6 -> v7
* Update check for duplicate user space memory regions to cover
additional possible scenarios.
v5 -> v6
* Check for max number of pages allocated for the internal data
structure for pages.
* Check for invalid memory region flags.
* Check for aligned physical memory regions.
* Update documentation to kernel-doc format.
* Check for duplicate user space memory regions.
* Use directly put_page() instead of unpin_user_pages(), to match the
get_user_pages() calls.
v4 -> v5
* Add early exit on set memory region ioctl function call error.
* Remove log on copy_from_user() failure.
* Exit without unpinning the pages on NE PCI dev request failure as
memory regions from the user space range may have already been added.
* Add check for the memory region user space address to be 2 MiB
aligned.
* Update logic to not have a hardcoded check for 2 MiB memory regions.
v3 -> v4
* Check enclave memory regions are from the same NUMA node as the
enclave CPUs.
* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.
v2 -> v3
* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
v1 -> v2
* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
* Check if enclave max memory regions is reached when setting an enclave
memory region.
* Check if enclave state is init when setting an enclave memory region.
Reviewed-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Link: https://lore.kernel.org/r/20200921121732.44291-12-andraprs@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Before setting the memory regions for the enclave, the enclave image
needs to be placed in memory. After the memory regions are set, this
memory cannot be used anymore by the VM, being carved out.
Add ioctl command logic to get the offset in enclave memory where to
place the enclave image. Then the user space tooling copies the enclave
image in the memory using the given memory offset.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* No changes.
v7 -> v8
* Add custom error code for incorrect enclave image load info flag.
v6 -> v7
* No changes.
v5 -> v6
* Check for invalid enclave image load flags.
v4 -> v5
* Check for the enclave not being started when invoking this ioctl call.
* Remove log on copy_from_user() / copy_to_user() failure.
v3 -> v4
* Use dev_err instead of custom NE log pattern.
* Set enclave image load offset based on flags.
* Update the naming for the ioctl command from metadata to info.
v2 -> v3
* No changes.
v1 -> v2
* New in v2.
Reviewed-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Link: https://lore.kernel.org/r/20200921121732.44291-11-andraprs@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
An enclave, before being started, has its resources set. One of its
resources is CPU.
A NE CPU pool is set and enclave CPUs are chosen from it. Offline the
CPUs from the NE CPU pool during the pool setup and online them back
during the NE CPU pool teardown. The CPU offline is necessary so that
there would not be more vCPUs than physical CPUs available to the
primary / parent VM. In that case the CPUs would be overcommitted and
would change the initial configuration of the primary / parent VM of
having dedicated vCPUs to physical CPUs.
The enclave CPUs need to be full cores and from the same NUMA node. CPU
0 and its siblings have to remain available to the primary / parent VM.
Add ioctl command logic for setting an enclave vCPU.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* Use the ne_devs data structure to get the refs for the NE PCI device.
v7 -> v8
* No changes.
v6 -> v7
* Check for error return value when setting the kernel parameter string.
* Use the NE misc device parent field to get the NE PCI device.
* Update the naming and add more comments to make more clear the logic
of handling full CPU cores and dedicating them to the enclave.
* Calculate the number of threads per core and not use smp_num_siblings
that is x86 specific.
v5 -> v6
* Check CPUs are from the same NUMA node before going through CPU
siblings during the NE CPU pool setup.
* Update documentation to kernel-doc format.
v4 -> v5
* Set empty string in case of invalid NE CPU pool.
* Clear NE CPU pool mask on pool setup failure.
* Setup NE CPU cores out of the NE CPU pool.
* Early exit on NE CPU pool setup if enclave(s) already running.
* Remove sanity checks for situations that shouldn't happen, only if
buggy system or broken logic at all.
* Add check for maximum vCPU id possible before looking into the CPU
pool.
* Remove log on copy_from_user() / copy_to_user() failure and on admin
capability check for setting the NE CPU pool.
* Update the ioctl call to not create a file descriptor for the vCPU.
* Split the CPU pool usage logic in 2 separate functions - one to get a
CPU from the pool and the other to check the given CPU is available in
the pool.
v3 -> v4
* Setup the NE CPU pool at runtime via a sysfs file for the kernel
parameter.
* Check enclave CPUs to be from the same NUMA node.
* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.
v2 -> v3
* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
* Remove file ops that do nothing for now - open, ioctl and release.
v1 -> v2
* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
* Check if enclave state is init when setting enclave vCPU.
Reviewed-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Link: https://lore.kernel.org/r/20200921121732.44291-10-andraprs@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add ioctl command logic for enclave VM creation. It triggers a slot
allocation. The enclave resources will be associated with this slot and
it will be used as an identifier for triggering enclave run.
Return a file descriptor, namely enclave fd. This is further used by the
associated user space enclave process to set enclave resources and
trigger enclave termination.
The poll function is implemented in order to notify the enclave process
when an enclave exits without a specific enclave termination command
trigger e.g. when an enclave crashes.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* Use the ne_devs data structure to get the refs for the NE PCI device.
v7 -> v8
* No changes.
v6 -> v7
* Use the NE misc device parent field to get the NE PCI device.
* Update the naming and add more comments to make more clear the logic
of handling full CPU cores and dedicating them to the enclave.
v5 -> v6
* Update the code base to init the ioctl function in this patch.
* Update documentation to kernel-doc format.
v4 -> v5
* Release the reference to the NE PCI device on create VM error.
* Close enclave fd on copy_to_user() failure; rename fd to enclave fd
while at it.
* Remove sanity checks for situations that shouldn't happen, only if
buggy system or broken logic at all.
* Remove log on copy_to_user() failure.
v3 -> v4
* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.
* Add metadata for the NUMA node for the enclave memory and CPUs.
v2 -> v3
* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
* Remove file ops that do nothing for now - open.
v1 -> v2
* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
Reviewed-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Link: https://lore.kernel.org/r/20200921121732.44291-9-andraprs@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Nitro Enclaves driver provides an ioctl interface to the user space
for enclave lifetime management e.g. enclave creation / termination and
setting enclave resources such as memory and CPU.
This ioctl interface is mapped to a Nitro Enclaves misc device.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* Use the ne_devs data structure to get the refs for the NE misc device
in the NE PCI device driver logic.
v7 -> v8
* Add define for the CID of the primary / parent VM.
* Update the NE PCI driver shutdown logic to include misc device
deregister.
v6 -> v7
* Set the NE PCI device the parent of the NE misc device to be able to
use it in the ioctl logic.
* Update the naming and add more comments to make more clear the logic
of handling full CPU cores and dedicating them to the enclave.
v5 -> v6
* Remove the ioctl to query API version.
* Update documentation to kernel-doc format.
v4 -> v5
* Update the size of the NE CPU pool string from 4096 to 512 chars.
v3 -> v4
* Use dev_err instead of custom NE log pattern.
* Remove the NE CPU pool init during kernel module loading, as the CPU
pool is now setup at runtime, via a sysfs file for the kernel
parameter.
* Add minimum enclave memory size definition.
v2 -> v3
* Remove the GPL additional wording as SPDX-License-Identifier is
already in place.
* Remove the WARN_ON calls.
* Remove linux/bug and linux/kvm_host includes that are not needed.
* Remove "ratelimited" from the logs that are not in the ioctl call
paths.
* Remove file ops that do nothing for now - open and release.
v1 -> v2
* Add log pattern for NE.
* Update goto labels to match their purpose.
* Update ne_cpu_pool data structure to include the global mutex.
* Update NE misc device mode to 0660.
* Check if the CPU siblings are included in the NE CPU pool, as full CPU
cores are given for the enclave(s).
Reviewed-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Link: https://lore.kernel.org/r/20200921121732.44291-8-andraprs@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In addition to the replies sent by the Nitro Enclaves PCI device in
response to command requests, out-of-band enclave events can happen e.g.
an enclave crashes. In this case, the Nitro Enclaves driver needs to be
aware of the event and notify the corresponding user space process that
abstracts the enclave.
Register an MSI-X interrupt vector to be used for this kind of
out-of-band events. The interrupt notifies that the state of an enclave
changed and the driver logic scans the state of each running enclave to
identify for which this notification is intended.
Create an workqueue to handle the out-of-band events. Notify user space
enclave process that is using a polling mechanism on the enclave fd.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* Use the reference to the pdev directly from the ne_pci_dev instead of
the one from the enclave data structure.
v7 -> v8
* No changes.
v6 -> v7
* No changes.
v5 -> v6
* Update documentation to kernel-doc format.
v4 -> v5
* Remove sanity checks for situations that shouldn't happen, only if
buggy system or broken logic at all.
v3 -> v4
* Use dev_err instead of custom NE log pattern.
* Return IRQ_NONE when interrupts are not handled.
v2 -> v3
* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
paths.
v1 -> v2
* Add log pattern for NE.
* Update goto labels to match their purpose.
Reviewed-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Link: https://lore.kernel.org/r/20200921121732.44291-7-andraprs@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Nitro Enclaves PCI device exposes a MMIO space that this driver
uses to submit command requests and to receive command replies e.g. for
enclave creation / termination or setting enclave resources.
Add logic for handling PCI device command requests based on the given
command type.
Register an MSI-X interrupt vector for command reply notifications to
handle this type of communication events.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* No changes.
v7 -> v8
* Update function signature for submit request and retrive reply
functions as they only returned 0, no error code.
* Include command type value in the error logs of ne_do_request().
v6 -> v7
* No changes.
v5 -> v6
* Update documentation to kernel-doc format.
v4 -> v5
* Remove sanity checks for situations that shouldn't happen, only if
buggy system or broken logic at all.
v3 -> v4
* Use dev_err instead of custom NE log pattern.
* Return IRQ_NONE when interrupts are not handled.
v2 -> v3
* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
paths.
v1 -> v2
* Add log pattern for NE.
* Remove the BUG_ON calls.
* Update goto labels to match their purpose.
* Add fix for kbuild report:
https://lore.kernel.org/lkml/202004231644.xTmN4Z1z%25lkp@intel.com/
Reviewed-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Link: https://lore.kernel.org/r/20200921121732.44291-6-andraprs@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Nitro Enclaves PCI device is used by the kernel driver as a means of
communication with the hypervisor on the host where the primary VM and
the enclaves run. It handles requests with regard to enclave lifetime.
Setup the PCI device driver and add support for MSI-X interrupts.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* Init the reference to the ne_pci_dev in the ne_devs data structure.
v7 -> v8
* Add NE PCI driver shutdown logic.
v6 -> v7
* No changes.
v5 -> v6
* Update documentation to kernel-doc format.
v4 -> v5
* Remove sanity checks for situations that shouldn't happen, only if
buggy system or broken logic at all.
v3 -> v4
* Use dev_err instead of custom NE log pattern.
* Update NE PCI driver name to "nitro_enclaves".
v2 -> v3
* Remove the GPL additional wording as SPDX-License-Identifier is
already in place.
* Remove the WARN_ON calls.
* Remove linux/bug include that is not needed.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
paths.
* Update kzfree() calls to kfree().
v1 -> v2
* Add log pattern for NE.
* Update PCI device setup functions to receive PCI device data structure and
then get private data from it inside the functions logic.
* Remove the BUG_ON calls.
* Add teardown function for MSI-X setup.
* Update goto labels to match their purpose.
* Implement TODO for NE PCI device disable state check.
* Update function name for NE PCI device probe / remove.
Reviewed-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
Signed-off-by: Alexandru Ciobotaru <alcioa@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Link: https://lore.kernel.org/r/20200921121732.44291-5-andraprs@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Nitro Enclaves driver keeps an internal info per each enclave.
This is needed to be able to manage enclave resources state, enclave
notifications and have a reference of the PCI device that handles
command requests for enclave lifetime management.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* Add data structure to keep references to both Nitro Enclaves misc and
PCI devices.
v7 -> v8
* No changes.
v6 -> v7
* Update the naming and add more comments to make more clear the logic
of handling full CPU cores and dedicating them to the enclave.
v5 -> v6
* Update documentation to kernel-doc format.
* Include in the enclave memory region data structure the user space
address and size for duplicate user space memory regions checks.
v4 -> v5
* Include enclave cores field in the enclave metadata.
* Update the vCPU ids data structure to be a cpumask instead of a list.
v3 -> v4
* Add NUMA node field for an enclave metadata as the enclave memory and
CPUs need to be from the same NUMA node.
v2 -> v3
* Remove the GPL additional wording as SPDX-License-Identifier is
already in place.
v1 -> v2
* Add enclave memory regions and vcpus count for enclave bookkeeping.
* Update ne_state comments to reflect NE_START_ENCLAVE ioctl naming
update.
Reviewed-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Link: https://lore.kernel.org/r/20200921121732.44291-4-andraprs@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Nitro Enclaves (NE) driver communicates with a new PCI device, that
is exposed to a virtual machine (VM) and handles commands meant for
handling enclaves lifetime e.g. creation, termination, setting memory
regions. The communication with the PCI device is handled using a MMIO
space and MSI-X interrupts.
This device communicates with the hypervisor on the host, where the VM
that spawned the enclave itself runs, e.g. to launch a VM that is used
for the enclave.
Define the MMIO space of the NE PCI device, the commands that are
provided by this device. Add an internal data structure used as private
data for the PCI device driver and the function for the PCI device
command requests handling.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* Fix indent for the NE PCI device command types enum.
v7 -> v8
* No changes.
v6 -> v7
* Update the documentation to include references to the NE PCI device id
and MMIO bar.
v5 -> v6
* Update documentation to kernel-doc format.
v4 -> v5
* Add a TODO for including flags in the request to the NE PCI device to
set a memory region for an enclave. It is not used for now.
v3 -> v4
* Remove the "packed" attribute and include padding in the NE data
structures.
v2 -> v3
* Remove the GPL additional wording as SPDX-License-Identifier is
already in place.
v1 -> v2
* Update path naming to drivers/virt/nitro_enclaves.
* Update NE_ENABLE_OFF / NE_ENABLE_ON defines.
Reviewed-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
Signed-off-by: Alexandru Ciobotaru <alcioa@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Link: https://lore.kernel.org/r/20200921121732.44291-3-andraprs@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
First, when memory allocation for sg_list_unaligned failed, there
is a bug of calling put_pages() as we haven't pinned any pages.
Second, if get_user_pages_fast() failed we should unpin num_pinned
pages.
This will address both.
As part of these changes, minor update in documentation.
Fixes: 6db7199407 ("drivers/virt: introduce Freescale hypervisor management driver")
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Link: https://lore.kernel.org/r/1598995271-6755-1-git-send-email-jrdr.linux@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The session lock is a mutex, not a spinlock, fix the comments to match.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20200709120858.63928-9-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Every now and then upstream adds new ioctls without notifying us,
log unknown ioctl requests as an error to catch these.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20200709120858.63928-8-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Upstream VirtualBox has defined and is using a few new request types for
vmmdev requests passed through /dev/vboxguest to the hypervisor.
Add the defines for these to vbox_vmmdev_types.h and add add them to the
whitelists of vmmdev requests which userspace is allowed to make.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1789545
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20200709120858.63928-7-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add support for the new VBG_IOCTL_ACQUIRE_GUEST_CAPABILITIES ioctl, this
is necessary for automatic resizing of the guest resolution to match the
VM-window size to work with the new VMSVGA virtual GPU which is now the
new default in VirtualBox.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1789545
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20200709120858.63928-6-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add vbg_set_host_capabilities() helper function, this is a preparation
patch for adding support for the VBGL_IOCTL_GUEST_CAPS_ACQUIRE ioctl.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20200709120858.63928-5-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rename guest_caps[_tracker] struct members to set_guest_caps[_tracker]
this is a preparation patch for adding support for the
VBGL_IOCTL_GUEST_CAPS_ACQUIRE ioctl.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20200709120858.63928-4-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Check the passed in capabilities against VMMDEV_GUEST_CAPABILITIES_MASK
instead of against VMMDEV_EVENT_VALID_EVENT_MASK.
This tightens the allowed mask from 0x7ff to 0x7.
Fixes: 0ba002bc43 ("virt: Add vboxguest driver for Virtual Box Guest integration")
Cc: stable@vger.kernel.org
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20200709120858.63928-3-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Until this commit the mainline kernel version (this version) of the
vboxguest module contained a bug where it defined
VBGL_IOCTL_VMMDEV_REQUEST_BIG and VBGL_IOCTL_LOG using
_IOC(_IOC_READ | _IOC_WRITE, 'V', ...) instead of
_IO(V, ...) as the out of tree VirtualBox upstream version does.
Since the VirtualBox userspace bits are always built against VirtualBox
upstream's headers, this means that so far the mainline kernel version
of the vboxguest module has been failing these 2 ioctls with -ENOTTY.
I guess that VBGL_IOCTL_VMMDEV_REQUEST_BIG is never used causing us to
not hit that one and sofar the vboxguest driver has failed to actually
log any log messages passed it through VBGL_IOCTL_LOG.
This commit changes the VBGL_IOCTL_VMMDEV_REQUEST_BIG and VBGL_IOCTL_LOG
defines to match the out of tree VirtualBox upstream vboxguest version,
while keeping compatibility with the old wrong request defines so as
to not break the kernel ABI in case someone has been using the old
request defines.
Fixes: f6ddd094f5 ("virt: Add vboxguest driver for Virtual Box Guest integration UAPI")
Cc: stable@vger.kernel.org
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20200709120858.63928-2-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit 84af7a6194 ("checkpatch: kconfig: prefer 'help' over
'---help---'"), the number of '---help---' has been gradually
decreasing, but there are still more than 2400 instances.
This commit finishes the conversion. While I touched the lines,
I also fixed the indentation.
There are a variety of indentation styles found.
a) 4 spaces + '---help---'
b) 7 spaces + '---help---'
c) 8 spaces + '---help---'
d) 1 space + 1 tab + '---help---'
e) 1 tab + '---help---' (correct indentation)
f) 1 tab + 1 space + '---help---'
g) 1 tab + 2 spaces + '---help---'
In order to convert all of them to 1 tab + 'help', I ran the
following commend:
$ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Through a labyrinthian sequence of includes, usage of page_to_phys(),
virt_to_phys() and out*() is dependent on the include of asm/io.h in
x86's asm/realmode.h, which is included in x86's asm/acpi.h and thus by
linux/acpi.h. Explicitly include linux/io.h to break the dependency on
realmode.h so that a future patch can remove the realmode.h include from
acpi.h without breaking the build.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Link: https://lkml.kernel.org/r/20191126165417.22423-8-sean.j.christopherson@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As part of the cleanup of some remaining y2038 issues, I came to
fs/compat_ioctl.c, which still has a couple of commands that need support
for time64_t.
In completely unrelated work, I spent time on cleaning up parts of this
file in the past, moving things out into drivers instead.
After Al Viro reviewed an earlier version of this series and did a lot
more of that cleanup, I decided to try to completely eliminate the rest
of it and move it all into drivers.
This series incorporates some of Al's work and many patches of my own,
but in the end stops short of actually removing the last part, which is
the scsi ioctl handlers. I have patches for those as well, but they need
more testing or possibly a rewrite.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJdsHCdAAoJEJpsee/mABjZtYkP/1JGl3jFv3Iq/5BCdPkaePP1
RtMJRNfURgK3GeuHUui330PvVjI/pLWXU/VXMK2MPTASpJLzYz3uCaZrpVWEMpDZ
+ImzGmgJkITlW1uWU3zOcQhOxTyb1hCZ0Ci+2xn9QAmyOL7prXoXCXDWv3h6iyiF
lwG+nW+HNtyx41YG+9bRfKNoG0ZJ+nkJ70BV6u0acQHXWn7Xuupa9YUmBL87hxAL
6dlJfLTJg6q8QSv/Q6LxslfWk2Ti8OOJZOwtFM5R8Bgl0iUcvshiRCKfv/3t9jXD
dJNvF1uq8z+gracWK49Qsfq5dnZ2ZxHFUo9u0NjbCrxNvWH/sdvhbaUBuJI75seH
VIznCkdxFhrqitJJ8KmxANxG08u+9zSKjSlxG2SmlA4qFx/AoStoHwQXcogJscNb
YIXYKmWBvwPzYu09QFAXdHFPmZvp/3HhMWU6o92lvDhsDwzkSGt3XKhCJea4DCaT
m+oCcoACqSWhMwdbJOEFofSub4bY43s5iaYuKes+c8O261/Dwg6v/pgIVez9mxXm
TBnvCsotq5m8wbwzv99eFqGeJH8zpDHrXxEtRR5KQqMqjLq/OQVaEzmpHZTEuK7n
e/V/PAKo2/V63g4k6GApQXDxnjwT+m0aWToWoeEzPYXS6KmtWC91r4bWtslu3rdl
bN65armTm7bFFR32Avnu
=lgCl
-----END PGP SIGNATURE-----
Merge tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground
Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann:
"As part of the cleanup of some remaining y2038 issues, I came to
fs/compat_ioctl.c, which still has a couple of commands that need
support for time64_t.
In completely unrelated work, I spent time on cleaning up parts of
this file in the past, moving things out into drivers instead.
After Al Viro reviewed an earlier version of this series and did a lot
more of that cleanup, I decided to try to completely eliminate the
rest of it and move it all into drivers.
This series incorporates some of Al's work and many patches of my own,
but in the end stops short of actually removing the last part, which
is the scsi ioctl handlers. I have patches for those as well, but they
need more testing or possibly a rewrite"
* tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits)
scsi: sd: enable compat ioctls for sed-opal
pktcdvd: add compat_ioctl handler
compat_ioctl: move SG_GET_REQUEST_TABLE handling
compat_ioctl: ppp: move simple commands into ppp_generic.c
compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t
compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic
compat_ioctl: unify copy-in of ppp filters
tty: handle compat PPP ioctls
compat_ioctl: move SIOCOUTQ out of compat_ioctl.c
compat_ioctl: handle SIOCOUTQNSD
af_unix: add compat_ioctl support
compat_ioctl: reimplement SG_IO handling
compat_ioctl: move WDIOC handling into wdt drivers
fs: compat_ioctl: move FITRIM emulation into file systems
gfs2: add compat_ioctl support
compat_ioctl: remove unused convert_in_user macro
compat_ioctl: remove last RAID handling code
compat_ioctl: remove /dev/raw ioctl translation
compat_ioctl: remove PCI ioctl translation
compat_ioctl: remove joystick ioctl translation
...
The .ioctl and .compat_ioctl file operations have the same prototype so
they can both point to the same function, which works great almost all
the time when all the commands are compatible.
One exception is the s390 architecture, where a compat pointer is only
31 bit wide, and converting it into a 64-bit pointer requires calling
compat_ptr(). Most drivers here will never run in s390, but since we now
have a generic helper for it, it's easy enough to use it consistently.
I double-checked all these drivers to ensure that all ioctl arguments
are used as pointers or are ignored, but are not interpreted as integer
values.
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: David Sterba <dsterba@suse.com>
Acked-by: Darren Hart (VMware) <dvhart@infradead.org>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
In hgcm_call_preprocess_linaddr memory is allocated for bounce_buf but
is not released if copy_form_user fails. In order to prevent memory leak
in case of failure, the assignment to bounce_buf_ret is moved before the
error check. This way the allocated bounce_buf will be released by the
caller.
Fixes: 579db9d45c ("virt: Add vboxguest VMMDEV communication code")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20190930204223.3660-1-navid.emamdoost@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add SPDX license identifiers to all Make/Kconfig files which:
- Have no license information of any form
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The "param.count" value is a u64 thatcomes from the user. The code
later in the function assumes that param.count is at least one and if
it's not then it leads to an Oops when we dereference the ZERO_SIZE_PTR.
Also the addition can have an integer overflow which would lead us to
allocate a smaller "pages" array than required. I can't immediately
tell what the possible run times implications are, but it's safest to
prevent the overflow.
Link: http://lkml.kernel.org/r/20181218082129.GE32567@kadam
Fixes: 6db7199407 ("drivers/virt: introduce Freescale hypervisor management driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Timur Tabi <timur@freescale.com>
Cc: Mihai Caraman <mihai.caraman@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
strndup_user() returns error pointers on error, and then in the error
handling we pass the error pointers to kfree(). It will cause an Oops.
Link: http://lkml.kernel.org/r/20181218082003.GD32567@kadam
Fixes: 6db7199407 ("drivers/virt: introduce Freescale hypervisor management driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Timur Tabi <timur@freescale.com>
Cc: Mihai Caraman <mihai.caraman@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To facilitate additional options to get_user_pages_fast() change the
singular write parameter to be gup_flags.
This patch does not change any functionality. New functionality will
follow in subsequent patches.
Some of the get_user_pages_fast() call sites were unchanged because they
already passed FOLL_WRITE or 0 for the write parameter.
NOTE: It was suggested to change the ordering of the get_user_pages_fast()
arguments to ensure that callers were converted. This breaks the current
GUP call site convention of having the returned pages be the final
parameter. So the suggestion was rejected.
Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marshall <hubcap@omnibond.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Userspace can make host function calls, called hgcm-calls through the
/dev/vboxguest device.
In this case we should not accept all hgcm-function-parameter-types, some
are only valid for in kernel calls.
This commit adds proper hgcm-function-parameter-type validation to the
ioctl for doing a hgcm-call from userspace.
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
VirtualBox 6.0.x has a new feature where the guest kernel driver passes
info about the origin of the request (e.g. userspace or kernelspace) to
the hypervisor.
If we do not pass this information then when running the 6.0.x userspace
guest-additions tools on a 6.0.x host, some requests will get denied
with a VERR_VERSION_MISMATCH error, breaking vboxservice.service and
the mounting of shared folders marked to be auto-mounted.
This commit implements passing the requestor info to the host, fixing this.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
This patch fixes the following warning:
drivers/virt/vboxguest/vboxguest_core.c: In function ‘vbg_core_ioctl’:
drivers/virt/vboxguest/vboxguest_core.c:1486:10: warning: this statement may fall through [-Wimplicit-fallthrough=]
f32bit = true;
~~~~~~~^~~~~~
drivers/virt/vboxguest/vboxguest_core.c:1489:2: note: here
case VBG_IOCTL_HGCM_CALL(0):
^~~~
Warning level 3 was used: -Wimplicit-fallthrough=3
Notice that, in this particular case, the code comment is modified
in accordance with what GCC is expecting to find.
This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With the new CONFIG_CC_OPTIMIZE_FOR_DEBUGGING option, we get a link
error in the vboxguest driver, when that fails to optimize out the
call to the compat handler:
drivers/virt/vboxguest/vboxguest_core.o: In function `vbg_ioctl_hgcm_call':
vboxguest_core.c:(.text+0x1f6e): undefined reference to `vbg_hgcm_call32'
Another compile-time check documents better what we want and avoids
the error.
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In vbg_misc_device_ioctl(), the header of the ioctl argument is copied from
the userspace pointer 'arg' and saved to the kernel object 'hdr'. Then the
'version', 'size_in', and 'size_out' fields of 'hdr' are verified.
Before this commit, after the checks a buffer for the entire request would
be allocated and then all data including the verified header would be
copied from the userspace 'arg' pointer again.
Given that the 'arg' pointer resides in userspace, a malicious userspace
process can race to change the data pointed to by 'arg' between the two
copies. By doing so, the user can bypass the verifications on the ioctl
argument.
This commit fixes this by using the already checked copy of the header
to fill the header part of the allocated buffer and only copying the
remainder of the data from userspace.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This was the only error path during probe without a message being logged
about what went wrong, this fixes this.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is not possible to get DMA32 zone memory through kmalloc, causing
the vboxguest driver to malfunction due to getting memory above
4G which the PCI device cannot handle.
This commit changes the kmalloc calls where the 4G limit matters to
using __get_free_pages() fixing vboxguest not working on x86_64 guests
with more then 4G RAM.
Cc: stable@vger.kernel.org
Reported-by: Eloy Coto Pereiro <eloy.coto@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a preparation patch for fixing issues on x86_64 virtual-machines
with more then 4G of RAM, atm we pass __GFP_DMA32 to kmalloc, but kmalloc
does not honor that, so we need to switch to get_pages, which means we
will not be able to use kfree to free memory allocated with vbg_alloc_req.
While at it also remove a comment on a vbg_alloc_req call which talks
about Windows (inherited from the vbox upstream cross-platform code).
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move the declarations of functions from vboxguest_utils.c which are only
meant for vboxguest internal use from include/linux/vbox_utils.h to
drivers/virt/vboxguest/vboxguest_core.h.
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:
for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
done
with de-mangling cleanups yet to come.
NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do. But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.
The next patch from Al will sort out the final differences, and we
should be all done.
Scripted-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here is the big pull request for char/misc drivers for 4.16-rc1.
There's a lot of stuff in here. Three new driver subsystems were added
for various types of hardware busses:
- siox
- slimbus
- soundwire
as well as a new vboxguest subsystem for the VirtualBox hypervisor
drivers.
There's also big updates from the FPGA subsystem, lots of Android binder
fixes, the usual handful of hyper-v updates, and lots of other smaller
driver updates.
All of these have been in linux-next for a long time, with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWnLuZw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynS4QCcCrPmwfD5PJwaF+q2dPfyKaflkQMAn0x6Wd+u
Gw3Z2scgjETUpwJ9ilnL
=xcQ0
-----END PGP SIGNATURE-----
Merge tag 'char-misc-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big pull request for char/misc drivers for 4.16-rc1.
There's a lot of stuff in here. Three new driver subsystems were added
for various types of hardware busses:
- siox
- slimbus
- soundwire
as well as a new vboxguest subsystem for the VirtualBox hypervisor
drivers.
There's also big updates from the FPGA subsystem, lots of Android
binder fixes, the usual handful of hyper-v updates, and lots of other
smaller driver updates.
All of these have been in linux-next for a long time, with no reported
issues"
* tag 'char-misc-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (155 commits)
char: lp: use true or false for boolean values
android: binder: use VM_ALLOC to get vm area
android: binder: Use true and false for boolean values
lkdtm: fix handle_irq_event symbol for INT_HW_IRQ_EN
EISA: Delete error message for a failed memory allocation in eisa_probe()
EISA: Whitespace cleanup
misc: remove AVR32 dependencies
virt: vbox: Add error mapping for VERR_INVALID_NAME and VERR_NO_MORE_FILES
soundwire: Fix a signedness bug
uio_hv_generic: fix new type mismatch warnings
uio_hv_generic: fix type mismatch warnings
auxdisplay: img-ascii-lcd: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
uio_hv_generic: add rescind support
uio_hv_generic: check that host supports monitor page
uio_hv_generic: create send and receive buffers
uio: document uio_hv_generic regions
doc: fix documentation about uio_hv_generic
vmbus: add monitor_id and subchannel_id to sysfs per channel
vmbus: fix ABI documentation
uio_hv_generic: use ISR callback method
...
Add error mapping for VERR_INVALID_NAME and VERR_NO_MORE_FILES vbox status
codes, these are both used by the vboxsf (shared folder) code.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
resource_size_t may be larger than pointers depending on configuration,
so we can run into this build warning:
drivers/virt/vboxguest/vboxguest_linux.c: In function 'vbg_pci_probe':
drivers/virt/vboxguest/vboxguest_linux.c:295:4: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
drivers/virt/vboxguest/vboxguest_linux.c:367:4: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
This uses the special %pap to print the address by reference.
Fixes: 0ba002bc43 ("virt: Add vboxguest driver for Virtual Box Guest integration")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This commit adds a driver for the Virtual Box Guest PCI device used in
Virtual Box virtual machines. Enabling this driver will add support for
Virtual Box Guest integration features such as copy-and-paste, seamless
mode and OpenGL pass-through.
This driver also offers vboxguest IPC functionality which is needed
for the vboxfs driver which offers folder sharing support.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This commits adds a header describing the hardware interface for the
Virtual Box Guest PCI device used in Virtual Box virtual machines and
utility functions for talking to the Virtual Box hypervisor over this
interface.
These utility functions will used both by the vboxguest driver for the
PCI device which offers the /dev/vboxguest ioctl API and by the vboxfs
driver which offers folder sharing support.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now that we have a custom printf format specifier, convert users of
full_name to use %pOF instead. This is preparation to remove storing
of the full path string for each node.
Signed-off-by: Rob Herring <robh@kernel.org>
Moving from get_user_pages() to get_user_pages_unlocked() simplifies the
code and takes advantage of VM_FAULT_RETRY functionality when faulting
in pages.
Link: http://lkml.kernel.org/r/20161101194332.23961-1-lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Mihai Caraman <mihai.caraman@freescale.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This removes the 'write' and 'force' from get_user_pages() and replaces
them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers
as use of this flag can result in surprising behaviour (and hence bugs)
within the mm subsystem.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>