[Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

Alex Williamson alex.williamson at redhat.com
Fri Apr 14 17:10:43 UTC 2023


On Fri, 14 Apr 2023 11:38:24 +0000
"Liu, Yi L" <yi.l.liu at intel.com> wrote:

> > From: Tian, Kevin <kevin.tian at intel.com>
> > Sent: Friday, April 14, 2023 5:12 PM
> >   
> > > From: Alex Williamson <alex.williamson at redhat.com>
> > > Sent: Friday, April 14, 2023 2:07 AM
> > >
> > > We had already iterated a proposal where the group-id is replaced with
> > > a dev-id in the existing ioctl and a flag indicates when the return
> > > value is a dev-id vs group-id.  This had a gap that userspace cannot
> > > determine if a reset is available given this information since un-owned
> > > devices report an invalid dev-id and userspace can't know if it has
> > > implicit ownership.  
> >  
> > >
> > > It seems cleaner to me though that we would could still re-use INFO in
> > > a similar way, simply defining a new flag bit which is valid only in
> > > the case of returning dev-ids and indicates if the reset is available.
> > > Therefore in one ioctl, userspace knows if hot-reset is available
> > > (based on a kernel determination) and can pull valid dev-ids from the  
> 
> Need to confirm the meaning of hot-reset available flag. I think it
> should at least meet below two conditions to set this flag. Although
> it may not mean hot-reset is for sure to succeed. (but should be
> a high chance).
> 
> 1) dev_set is resettable (all affected device are in dev_set)
> 2) affected device are owned by the current user

Per thread with Kevin, ownership can't always be known by the kernel.
Beyond the group vs cdev discussion there, isn't it also possible
(though perhaps not recommended) that a user can have multiple iommufd
ctxs?  So I think 2) becomes "ownership of the affected dev-set can be
inferred from the iommufd_ctx of the calling device", iow, the
null-array calling model is available and the flag is redefined to
match.  Reset may still be available via the proof-of-ownership model.
 
> Also, we need to has assumption that below two cases are rare
> if user encounters it, it just bad luck for them. I think the existing
> _INFO and hot-reset already has such assumption. So cdev mode
> can adopt it as well.
> 
> a) physical topology change (e.g. new devices plugged to affected slot)
> b) an affected device is unbound from vfio

Yes, these are sufficiently rare that we can't do much about them.

> > So the kernel needs to compare the group id between devices with
> > valid dev-ids and devices with invalid dev-ids to decide the implicit
> > ownership. For noiommu device which has no group_id when
> > VFIO_GROUP is off then it's resettable only if having a valid dev_id.  
> 
> In cdev mode, noiommu device doesn't have dev_id as it is not
> bound to valid iommufd. So if VFIO_GROUP is off, we may never
> allow hot-reset for noiommu devices. But we don't want to have
> regression with noiommu devices. Perhaps we may define the usage
> of the resettable flag like this:
> 1) if it is set, user does not need to own all the affected devices as
>     some of them may have been owned implicitly. Kernel should have
>     checked it.
> 2) if the flag is not set, that means user needs to check ownership
>     by itself. It needs to own all the affected devices. If not, don't
>    do hot-reset.

Exactly, the flag essentially indicates that the null-array approach is
available, lack of the flag indicates proof-of-ownership is required.
 
> This way we can still make noiommu devices support hot-reset
> just like VFIO_GROUP is on. Because noiommu devices have fake
> groups, such groups are all singleton. So checking all affected
> devices are opened by user is just same as check all affected
> groups.

Yep.

> > The only corner case with this option is when a user mixes group
> > and cdev usages. iirc you mentioned it's a valid usage to be supported.
> > In that case the kernel doesn't have sufficient knowledge to judge
> > 'resettable' as it doesn't know which groups are opened by this user.
> >
> > Not sure whether we can leave it in a ugly way so INFO may not tell
> > 'resettable' accurately in that weird scenario.  
> 
> This seems not easy to support. If above scenario is allowed there can be
> three cases that returns invalid dev_id.
> 1) devices not opened by user but owned implicitly

The cdev approach has a hard time with this in general, it has no way to
represent unopened devices. so any case where the nature of an unopened
device block reset on the dev-set is rather opaque to the user.

> 2) devices not owned by user

(and presumable not owned)  We still provide BDF.  Not much difference
from the group case here, being able to point to a BDF or group is
about all we can do.

> 3) devices opened via group but owned by user

I think this still works in the proof-of-ownership, passing fds to
hot-reset model.

> User would require more info to tell the above cases from each other.

Obviously we could be equivalent to the group model if IOMMU groups
were exposed for a device and all devices had IOMMU groups, but
reasons...

> > > array to associate affected, owned devices, and still has the
> > > equivalent information to know that one or more of the devices listed
> > > with an invalid dev-id are preventing the hot-reset from being
> > > available.
> > >
> > > Is that an option?  Thanks,
> > >  
> > 
> > This works for me if above corner case can be waived.  
> 
> One side check, perhaps already confirmed in prior email. @Alex, So
> the reason for the prediction of hot-reset is to avoid the possible
> vfio_pci_pre_reset() which does heavy operations like stop DMA and
> copy config space. Is it? Any other special reason? Anyhow, this reason
> is enough for this prediction per my understanding.

It's not clear to me what "prediction" is referring to.  As above, I
think we can redefine the reset-available flag I proposed to more
restrictively indicate that the null-array approach is available based
on the dev-set group in relation to the iommufd_ctx of the calling
device.  Prediction of the affected devices seems like basic
functionality to me, we can't assume the user's usage model, they must
be able to make a well informed decision regarding affected devices.
Thanks,

Alex



More information about the Intel-gfx mailing list