2019-06-04 16:11:33 +08:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0-only */
|
2016-11-17 04:46:13 +08:00
|
|
|
/*
|
|
|
|
* Mediated device interal definitions
|
|
|
|
*
|
|
|
|
* Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
|
|
|
|
* Author: Neo Jia <cjia@nvidia.com>
|
|
|
|
* Kirti Wankhede <kwankhede@nvidia.com>
|
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef MDEV_PRIVATE_H
|
|
|
|
#define MDEV_PRIVATE_H
|
|
|
|
|
|
|
|
int mdev_bus_register(void);
|
|
|
|
void mdev_bus_unregister(void);
|
|
|
|
|
2016-12-30 23:13:41 +08:00
|
|
|
struct mdev_parent {
|
|
|
|
struct device *dev;
|
|
|
|
const struct mdev_parent_ops *ops;
|
|
|
|
struct kref ref;
|
|
|
|
struct list_head next;
|
|
|
|
struct kset *mdev_types_kset;
|
|
|
|
struct list_head type_list;
|
2019-06-07 00:52:33 +08:00
|
|
|
/* Synchronize device creation/removal with parent unregistration */
|
|
|
|
struct rw_semaphore unreg_sem;
|
2016-12-30 23:13:41 +08:00
|
|
|
};
|
|
|
|
|
2016-12-30 23:13:44 +08:00
|
|
|
struct mdev_device {
|
|
|
|
struct device dev;
|
|
|
|
struct mdev_parent *parent;
|
2019-01-11 03:00:27 +08:00
|
|
|
guid_t uuid;
|
2016-12-30 23:13:44 +08:00
|
|
|
void *driver_data;
|
|
|
|
struct list_head next;
|
|
|
|
struct kobject *type_kobj;
|
2019-04-12 12:13:24 +08:00
|
|
|
struct device *iommu_device;
|
vfio/mdev: Check globally for duplicate devices
When we create an mdev device, we check for duplicates against the
parent device and return -EEXIST if found, but the mdev device
namespace is global since we'll link all devices from the bus. We do
catch this later in sysfs_do_create_link_sd() to return -EEXIST, but
with it comes a kernel warning and stack trace for trying to create
duplicate sysfs links, which makes it an undesirable response.
Therefore we should really be looking for duplicates across all mdev
parent devices, or as implemented here, against our mdev device list.
Using mdev_list to prevent duplicates means that we can remove
mdev_parent.lock, but in order not to serialize mdev device creation
and removal globally, we add mdev_device.active which allows UUIDs to
be reserved such that we can drop the mdev_list_lock before the mdev
device is fully in place.
Two behavioral notes; first, mdev_parent.lock had the side-effect of
serializing mdev create and remove ops per parent device. This was
an implementation detail, not an intentional guarantee provided to
the mdev vendor drivers. Vendor drivers can trivially provide this
serialization internally if necessary. Second, review comments note
the new -EAGAIN behavior when the device, and in particular the remove
attribute, becomes visible in sysfs. If a remove is triggered prior
to completion of mdev_device_create() the user will see a -EAGAIN
error. While the errno is different, receiving an error during this
period is not, the previous implementation returned -ENODEV for the
same condition. Furthermore, the consistency to the user is improved
in the case where mdev_device_remove_ops() returns error. Previously
concurrent calls to mdev_device_remove() could see the device
disappear with -ENODEV and return in the case of error. Now a user
would see -EAGAIN while the device is in this transitory state.
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-05-16 03:53:55 +08:00
|
|
|
bool active;
|
2016-12-30 23:13:44 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
#define to_mdev_device(dev) container_of(dev, struct mdev_device, dev)
|
|
|
|
#define dev_is_mdev(d) ((d)->bus == &mdev_bus_type)
|
|
|
|
|
2016-11-17 04:46:13 +08:00
|
|
|
struct mdev_type {
|
|
|
|
struct kobject kobj;
|
|
|
|
struct kobject *devices_kobj;
|
2016-12-30 23:13:38 +08:00
|
|
|
struct mdev_parent *parent;
|
2016-11-17 04:46:13 +08:00
|
|
|
struct list_head next;
|
|
|
|
struct attribute_group *group;
|
|
|
|
};
|
|
|
|
|
|
|
|
#define to_mdev_type_attr(_attr) \
|
|
|
|
container_of(_attr, struct mdev_type_attribute, attr)
|
|
|
|
#define to_mdev_type(_kobj) \
|
|
|
|
container_of(_kobj, struct mdev_type, kobj)
|
|
|
|
|
2016-12-30 23:13:38 +08:00
|
|
|
int parent_create_sysfs_files(struct mdev_parent *parent);
|
|
|
|
void parent_remove_sysfs_files(struct mdev_parent *parent);
|
2016-11-17 04:46:13 +08:00
|
|
|
|
|
|
|
int mdev_create_sysfs_files(struct device *dev, struct mdev_type *type);
|
|
|
|
void mdev_remove_sysfs_files(struct device *dev, struct mdev_type *type);
|
|
|
|
|
2019-01-11 03:00:27 +08:00
|
|
|
int mdev_device_create(struct kobject *kobj,
|
|
|
|
struct device *dev, const guid_t *uuid);
|
vfio/mdev: Improve the create/remove sequence
This patch addresses below two issues and prepares the code to address
3rd issue listed below.
1. mdev device is placed on the mdev bus before it is created in the
vendor driver. Once a device is placed on the mdev bus without creating
its supporting underlying vendor device, mdev driver's probe() gets
triggered. However there isn't a stable mdev available to work on.
create_store()
mdev_create_device()
device_register()
...
vfio_mdev_probe()
[...]
parent->ops->create()
vfio_ap_mdev_create()
mdev_set_drvdata(mdev, matrix_mdev);
/* Valid pointer set above */
Due to this way of initialization, mdev driver who wants to use the mdev,
doesn't have a valid mdev to work on.
2. Current creation sequence is,
parent->ops_create()
groups_register()
Remove sequence is,
parent->ops->remove()
groups_unregister()
However, remove sequence should be exact mirror of creation sequence.
Once this is achieved, all users of the mdev will be terminated first
before removing underlying vendor device.
(Follow standard linux driver model).
At that point vendor's remove() ops shouldn't fail because taking the
device off the bus should terminate any usage.
3. When remove operation fails, mdev sysfs removal attempts to add the
file back on already removed device. Following call trace [1] is observed.
[1] call trace:
kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327 sysfs_create_file_ns+0x7f/0x90
kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted 5.1.0-rc6-vdevbus+ #6
kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
kernel: Call Trace:
kernel: remove_store+0xdc/0x100 [mdev]
kernel: kernfs_fop_write+0x113/0x1a0
kernel: vfs_write+0xad/0x1b0
kernel: ksys_write+0x5a/0xe0
kernel: do_syscall_64+0x5a/0x210
kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
Therefore, mdev core is improved in following ways.
1. Split the device registration/deregistration sequence so that some
things can be done between initialization of the device and hooking it
up to the bus respectively after deregistering it from the bus but
before giving up our final reference.
In particular, this means invoking the ->create() and ->remove()
callbacks in those new windows. This gives the vendor driver an
initialized mdev device to work with during creation.
At the same time, a bus driver who wish to bind to mdev driver also
gets initialized mdev device.
This follows standard Linux kernel bus and device model.
2. During remove flow, first remove the device from the bus. This
ensures that any bus specific devices are removed.
Once device is taken off the mdev bus, invoke remove() of mdev
from the vendor driver.
3. The driver core device model provides way to register and auto
unregister the device sysfs attribute groups at dev->groups.
Make use of dev->groups to let core create the groups and eliminate
code to avoid explicit groups creation and removal.
To ensure, that new sequence is solid, a below stack dump of a
process is taken who attempts to remove the device while device is in
use by vfio driver and user application.
This stack dump validates that vfio driver guards against such device
removal when device is in use.
cat /proc/21962/stack
[<0>] vfio_del_group_dev+0x216/0x3c0 [vfio]
[<0>] mdev_remove+0x21/0x40 [mdev]
[<0>] device_release_driver_internal+0xe8/0x1b0
[<0>] bus_remove_device+0xf9/0x170
[<0>] device_del+0x168/0x350
[<0>] mdev_device_remove_common+0x1d/0x50 [mdev]
[<0>] mdev_device_remove+0x8c/0xd0 [mdev]
[<0>] remove_store+0x71/0x90 [mdev]
[<0>] kernfs_fop_write+0x113/0x1a0
[<0>] vfs_write+0xad/0x1b0
[<0>] ksys_write+0x5a/0xe0
[<0>] do_syscall_64+0x5a/0x210
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<0>] 0xffffffffffffffff
This prepares the code to eliminate calling device_create_file() in
subsequent patch.
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-06-07 00:52:32 +08:00
|
|
|
int mdev_device_remove(struct device *dev);
|
2016-11-17 04:46:13 +08:00
|
|
|
|
|
|
|
#endif /* MDEV_PRIVATE_H */
|