[PATCH 0/3] sample: vfio mdev display devices.
Kirti Wankhede
kwankhede at nvidia.com
Wed Apr 25 15:30:39 UTC 2018
On 4/25/2018 4:29 AM, Alex Williamson wrote:
> On Wed, 25 Apr 2018 01:20:08 +0530
> Kirti Wankhede <kwankhede at nvidia.com> wrote:
>
>> On 4/24/2018 3:10 AM, Alex Williamson wrote:
>>> On Wed, 18 Apr 2018 12:31:53 -0600
>>> Alex Williamson <alex.williamson at redhat.com> wrote:
>>>
>>>> On Mon, 9 Apr 2018 12:35:10 +0200
>>>> Gerd Hoffmann <kraxel at redhat.com> wrote:
>>>>
>>>>> This little series adds three drivers, for demo-ing and testing vfio
>>>>> display interface code. There is one mdev device for each interface
>>>>> type (mdpy.ko for region and mbochs.ko for dmabuf).
>>>>
>>>> Erik Skultety brought up a good question today regarding how libvirt is
>>>> meant to handle these different flavors of display interfaces and
>>>> knowing whether a given mdev device has display support at all. It
>>>> seems that we cannot simply use the default display=auto because
>>>> libvirt needs to specifically configure gl support for a dmabuf type
>>>> interface versus not having such a requirement for a region interface,
>>>> perhaps even removing the emulated graphics in some cases (though I
>>>> don't think we have boot graphics through either solution yet).
>>>> Additionally, GVT-g seems to need the x-igd-opregion support
>>>> enabled(?), which is a non-starter for libvirt as it's an experimental
>>>> option!
>>>>
>>>> Currently the only way to determine display support is through the
>>>> VFIO_DEVICE_QUERY_GFX_PLANE ioctl, but for libvirt to probe that on
>>>> their own they'd need to get to the point where they could open the
>>>> vfio device and perform the ioctl. That means opening a vfio
>>>> container, adding the group, setting the iommu type, and getting the
>>>> device. I was initially a bit appalled at asking libvirt to do that,
>>>> but the alternative is to put this information in sysfs, but doing that
>>>> we risk that we need to describe every nuance of the mdev device
>>>> through sysfs and it becomes a dumping ground for every possible
>>>> feature an mdev device might have.
>>>>
>>
>> One or two sysfs file for each feature shouldn't be that much of over
>> head? In kernel, other subsystem modules expose capability through
>> sysfs, like PCI subsystem adds 'boot_vga' file for VGA device which
>> returns 0/1 depending on if its boot VGA device. Similarly
>> 'd3cold_allowed', 'msi_bus'...
>
> Obviously we could add sysfs files, but unlike properties that the PCI
> core exposes about struct pci_dev fields, the idea of a vfio_device is
> much more abstract. Each bus driver creates its own device
> representation, so we have a top level vfio_device referencing through
> an opaque pointer a vfio_pci_device, vfio_platform_device, or
> mdev_device, and each mdev vendor driver creates its own private data
> structure below the mdev_device. So it's not quite a simple as one new
> attribute "show" function to handle all devices of that bus_type. We
> need a consistent implementation in each bus driver and vendor driver
> or we need to figure out how to percolate the information up to the
> vfio core. Your idea below seems to take the percolate approach.
>
>>>> So I was ready to return and suggest that maybe libvirt should probe
>>>> the device to know about these ancillary configuration details, but
>>>> then I remembered that both mdev vGPU vendors had external dependencies
>>>> to even allow probing the device. KVMGT will fail to open the device
>>>> if it's not associated with an instance of KVM and NVIDIA vGPU, I
>>>> believe, will fail if the vGPU manager process cannot find the QEMU
>>>> instance to extract the VM UUID. (Both of these were bad ideas)
>>>
>>> Here's another proposal that's really growing on me:
>>>
>>> * Fix the vendor drivers! Allow devices to be opened and probed
>>> without these external dependencies.
>>> * Libvirt uses the existing vfio API to open the device and probe the
>>> necessary ioctls, if it can't probe the device, the feature is
>>> unavailable, ie. display=off, no migration.
>>>
>>
>> I'm trying to think simpler mechanism using sysfs that could work for
>> any feature and knowing source-destination migration compatibility check
>> by libvirt before initiating migration.
>>
>> I have another proposal:
>> * Add a ioctl VFIO_DEVICE_PROBE_FEATURES
>> struct vfio_device_features {
>> __u32 argsz;
>> __u32 features;
>> }
>>
>> Define bit for each feature:
>> #define VFIO_DEVICE_FEATURE_DISPLAY_REGION (1 << 0)
>> #define VFIO_DEVICE_FEATURE_DISPLAY_DMABUF (1 << 1)
>> #define VFIO_DEVICE_FEATURE_MIGRATION (1 << 2)
>>
>> * Vendor driver returns bitmask of supported features during
>> initialization phase.
>>
>> * In vfio core module, trap this ioctl for each device in
>> vfio_device_fops_unl_ioctl(),
>
> Whoops, chicken and egg problem, VFIO_GROUP_GET_DEVICE_FD is our
> blocking point with mdev drivers, we can't get a device fd, so we can't
> call an ioctl on the device fd.
>
I'm sorry, I thought we could expose features when QEMU initialize, but
libvirt needs to know supported features before QEMU initialize.
>> check features bitmask returned by vendor
>> driver and add a sysfs file if feature is supported that device. This
>> sysfs file would return 0/1.
>
> I don't understand why we have an ioctl interface, if the user can get
> to the device fd then we have existing interfaces to probe these
> things, it seems like you're just wanting to pass a features bitmap
> through to vfio_add_group_dev() that vfio-core would expose through
> sysfs, but a list of feature bits doesn't convey enough info except for
> the most basic uses.
>
Yes, vfio_add_group_dev() seems to be better way to convey features to
vfio core.
>> For migration this bit will only indicate if host driver supports
>> migration feature.
>>
>> For source and destination compatibility check libvirt would need more
>> data/variables to check like,
>> * if same type of 'mdev_type' device create-able at destination,
>> i.e. if ('mdev_type'->available_instances > 0)
>>
>> * if host_driver_version at source and destination are compatible.
>> Host driver from same release branch should be mostly compatible, but if
>> there are major changes in structures or APIs, host drivers from
>> different branches might not be compatible, for example, if source and
>> destination are from different branches and one of the structure had
>> changed, then data collected at source might not be compatible with
>> structures at destination and typecasting it to changed structures would
>> mess up migrated data during restoration.
>
> Of course now you're asking that libvirt understand the release
> versioning scheme of every vendor driver and that it remain
> programatically consistent. We can't even do this with in-kernel
> drivers. And in the end, still the best we can do is guess.
>
Libvirt doesn't need to understand the version, libvirt need to do
strcmp version string from source and destination. If those are equal,
then libvirt would understand that they are compatible.
>> * if guest_driver_version is compatible with host driver at destination.
>> For mdev devices, guest driver communicates with host driver in some
>> form. If there are changes in structures/APIs of such communication,
>> guest driver at source might not be compatible with host driver at
>> destination.
>
> And another guess plus now the guest driver is involved which libvirt
> has no visibility to.
>
Like above libvirt need to do strcmp.
>> 'available_instances' sysfs already exist, later two should be added by
>> vendor driver which libvirt can use for migration compatibility check.
>
> As noted previously, display and migration are not necessarily
> mdev-only features, it's possible that vfio-pci or vfio-platform could
> also implement these, so the sysfs interface cannot be restricted to
> the mdev template and lifecycle interface.
>
I agree.
Feature bitmask passed to vfio core is not mdev specific. But here
'available_instances' for migration compatibility check is mdev
specific. If mdev device is not create-able at destination, there is no
point in initiating migration by libvirt.
> One more try... we have a vfio_group fd. This is created by the bus
> drivers calling vfio_add_group_dev() and registers a struct device, a
> struct vfio_device_ops, and private data. Typically we only wire the
> device_ops to the resulting file descriptor we get from
> VFIO_GROUP_GET_DEVICE_FD, but could we enable sort of a nested ioctl
> through the group fd? The ioctl would need to take a string arg to
> match to a device name, plus an ioctl cmd and arg for the device_ops
> ioctl. The group ioctl would need to filter cmds to known, benign
> queries. We'd also need to verify that the allowed ioctls have no
> dependencies on setup done in device_ops.open().
So these ioctls would be called without devices open() call, doesn't
this seem to be against file operations standard?
Thanks,
Kirti
> *_INFO and
> QUERY_GFX_PLANE ioctls would be the only candidates. Bus drivers could
> of course keep an open count in their private data so they know how the
> ioctl is being called (if necessary) and the group fd only allows a
> single open, so there's no risk that another user could interact with
> the group in bad ways once the device is opened (and of course we use
> file level access control on the group device file anyway). This is
> sort of a rethink of Paolo's suggestion of a read-only fd, but the fd
> is the existing group fd and any references to the device would only be
> held around the calling of the nested ioctl. Could it work? Thanks,
>
More information about the intel-gvt-dev
mailing list