[Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

Jason Gunthorpe jgg at nvidia.com
Wed Apr 5 17:23:43 UTC 2023


On Wed, Apr 05, 2023 at 10:52:15AM -0600, Alex Williamson wrote:
> On Wed, 5 Apr 2023 13:37:05 -0300
> Jason Gunthorpe <jgg at nvidia.com> wrote:
> 
> > On Wed, Apr 05, 2023 at 10:25:45AM -0600, Alex Williamson wrote:
> > 
> > > But that kind of brings to light the question of what does the user do
> > > when they encounter this situation.  
> > 
> > What does it do now when it encounters a group_id it doesn't
> > understand? Userspace already doesn't know if the foreign group is
> > open or not, right?
> 
> It's simple, there is currently no screwiness around opened devices.
> If the caller doesn't own all the groups mapping to the affected
> devices, hot-reset is not available.

That still has nasty edge cases. If the reset group spans beyond a
single iommu group you end up with qemu being unable to operate reset
at all, and it is unfixable from an API perspective as we can't pass
in groups that VFIO isn't going to use.

I think you are right, the fact we'd have to return -1 dev_ids to this
modified API is pretty damaging, it doesn't seem like a good
direction.

> This leads to scenarios where the info ioctl indicates a hot-reset is
> initially available, perhaps only because one of the affected devices
> was not opened at the time, and now it fails when QEMU actually tries
> to use it.

I would like it if the APIs toward the kernel were only about the
kernel's security apparatus. It is makes it easier to reason about the
kernel side and gives nice simple well defined APIs.

This is a good point that qemu needs to make a policy decision if it
is happy about the VFIO configuration - but that is a policy decision
that should not become entangled with the kernel's security checks.

Today qemu can make this policy choice the same way it does right now
- call _INFO and check the group_ids. It gets the exact same outcome
as today. We already discussed that we need to expose the group ID
through an ioctl someplace.

If this is too awkward we could add a query to the kernel if the cdev
is "reset exclusive" - eg the iommufd covers all the groups that span
the reset set.

Jason


More information about the Intel-gfx mailing list