[Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

Liu, Yi L yi.l.liu at intel.com
Mon Apr 17 04:20:27 UTC 2023


> From: Alex Williamson <alex.williamson at redhat.com>
> Sent: Saturday, April 15, 2023 1:11 AM
> 
> On Fri, 14 Apr 2023 11:38:24 +0000
> "Liu, Yi L" <yi.l.liu at intel.com> wrote:
> 
> > > From: Tian, Kevin <kevin.tian at intel.com>
> > > Sent: Friday, April 14, 2023 5:12 PM
> > >
> > > > From: Alex Williamson <alex.williamson at redhat.com>
> > > > Sent: Friday, April 14, 2023 2:07 AM
> > > >
> > > > We had already iterated a proposal where the group-id is replaced with
> > > > a dev-id in the existing ioctl and a flag indicates when the return
> > > > value is a dev-id vs group-id.  This had a gap that userspace cannot
> > > > determine if a reset is available given this information since un-owned
> > > > devices report an invalid dev-id and userspace can't know if it has
> > > > implicit ownership.
> > >
> > > >
> > > > It seems cleaner to me though that we would could still re-use INFO in
> > > > a similar way, simply defining a new flag bit which is valid only in
> > > > the case of returning dev-ids and indicates if the reset is available.
> > > > Therefore in one ioctl, userspace knows if hot-reset is available
> > > > (based on a kernel determination) and can pull valid dev-ids from the
> >
> > Need to confirm the meaning of hot-reset available flag. I think it
> > should at least meet below two conditions to set this flag. Although
> > it may not mean hot-reset is for sure to succeed. (but should be
> > a high chance).
> >
> > 1) dev_set is resettable (all affected device are in dev_set)
> > 2) affected device are owned by the current user
> 
> Per thread with Kevin, ownership can't always be known by the kernel.
> Beyond the group vs cdev discussion there, isn't it also possible
> (though perhaps not recommended) that a user can have multiple iommufd
> ctxs?  So I think 2) becomes "ownership of the affected dev-set can be
> inferred from the iommufd_ctx of the calling device", iow, the
> null-array calling model is available and the flag is redefined to
> match.  Reset may still be available via the proof-of-ownership model.

Yes, if there are multiple iommufd ctxs, this shall fall back to use
the proof-of-ownership model.

> 
> > Also, we need to has assumption that below two cases are rare
> > if user encounters it, it just bad luck for them. I think the existing
> > _INFO and hot-reset already has such assumption. So cdev mode
> > can adopt it as well.
> >
> > a) physical topology change (e.g. new devices plugged to affected slot)
> > b) an affected device is unbound from vfio
> 
> Yes, these are sufficiently rare that we can't do much about them.
> 
> > > So the kernel needs to compare the group id between devices with
> > > valid dev-ids and devices with invalid dev-ids to decide the implicit
> > > ownership. For noiommu device which has no group_id when
> > > VFIO_GROUP is off then it's resettable only if having a valid dev_id.
> >
> > In cdev mode, noiommu device doesn't have dev_id as it is not
> > bound to valid iommufd. So if VFIO_GROUP is off, we may never
> > allow hot-reset for noiommu devices. But we don't want to have
> > regression with noiommu devices. Perhaps we may define the usage
> > of the resettable flag like this:
> > 1) if it is set, user does not need to own all the affected devices as
> >     some of them may have been owned implicitly. Kernel should have
> >     checked it.
> > 2) if the flag is not set, that means user needs to check ownership
> >     by itself. It needs to own all the affected devices. If not, don't
> >    do hot-reset.
> 
> Exactly, the flag essentially indicates that the null-array approach is
> available, lack of the flag indicates proof-of-ownership is required.
> 
> > This way we can still make noiommu devices support hot-reset
> > just like VFIO_GROUP is on. Because noiommu devices have fake
> > groups, such groups are all singleton. So checking all affected
> > devices are opened by user is just same as check all affected
> > groups.
> 
> Yep.
> 
> > > The only corner case with this option is when a user mixes group
> > > and cdev usages. iirc you mentioned it's a valid usage to be supported.
> > > In that case the kernel doesn't have sufficient knowledge to judge
> > > 'resettable' as it doesn't know which groups are opened by this user.
> > >
> > > Not sure whether we can leave it in a ugly way so INFO may not tell
> > > 'resettable' accurately in that weird scenario.
> >
> > This seems not easy to support. If above scenario is allowed there can be
> > three cases that returns invalid dev_id.
> > 1) devices not opened by user but owned implicitly
> 
> The cdev approach has a hard time with this in general, it has no way to
> represent unopened devices. so any case where the nature of an unopened
> device block reset on the dev-set is rather opaque to the user.
> 
> > 2) devices not owned by user
> 
> (and presumable not owned)  We still provide BDF.  Not much difference
> from the group case here, being able to point to a BDF or group is
> about all we can do.
> 
> > 3) devices opened via group but owned by user
> 
> I think this still works in the proof-of-ownership, passing fds to
> hot-reset model.

Ok. let's see below scenario and user's processing makes sense.

Say there are five devices (devA, devB, devC, devD, devE) in the same reset
group. devA and devB are in the same iommu group. devC, devD, and devE have
separate iommu groups. Say devA is opened via cdev, devB is not opened, devC
is opened via group, devD is opened cdev but bound to another iommufdctx that
is different with devA. devE is not opened by any user

If this INFO is called on devA, user should get a valid dev_id for devA, but
four invalid dev_ids. The resettable flag should be clear. Below is how user
to handle the info returned.

- For devB, user shall get the group_id for devA, and also get group_id for
  devB, hence able to check ownership of devB by checking the group
- For devC, user can check ownership by the group_id and bdf returned
- For devD, if it is opened by the user, should be able to find it by bdf
- For devE, user shall fail to find it hence consider no ownership on it.

To finish the above check, user needs to get group_id via devid an also needs
to get group_id via device fd. Is it?

The above example may be the most tricky scenario. Is it? user shall not do
hot-reset as not all affected devices are owned by user. But if devE is also
opened by user, it could do hot-reset.

> > User would require more info to tell the above cases from each other.
> 
> Obviously we could be equivalent to the group model if IOMMU groups
> were exposed for a device and all devices had IOMMU groups, but
> reasons...
> 
> > > > array to associate affected, owned devices, and still has the
> > > > equivalent information to know that one or more of the devices listed
> > > > with an invalid dev-id are preventing the hot-reset from being
> > > > available.
> > > >
> > > > Is that an option?  Thanks,
> > > >
> > >
> > > This works for me if above corner case can be waived.
> >
> > One side check, perhaps already confirmed in prior email. @Alex, So
> > the reason for the prediction of hot-reset is to avoid the possible
> > vfio_pci_pre_reset() which does heavy operations like stop DMA and
> > copy config space. Is it? Any other special reason? Anyhow, this reason
> > is enough for this prediction per my understanding.
> 
> It's not clear to me what "prediction" is referring to.

It is predicting whether hot-reset ioctl can work or not as you mentioned
in prior discussion.[1].

"I disagree, as I've argued before, the info ioctl becomes so weak and
effectively arbitrary from a user perspective at being able to predict
whether the hot-reset ioctl works that it becomes useless, diminishing
the entire hot-reset info/execute API."

[1] https://lore.kernel.org/kvm/20230405134945.29e967be.alex.williamson@redhat.com/

> As above, I
> think we can redefine the reset-available flag I proposed to more
> restrictively indicate that the null-array approach is available based
> on the dev-set group in relation to the iommufd_ctx of the calling
> device.  Prediction of the affected devices seems like basic
> functionality to me, we can't assume the user's usage model, they must
> be able to make a well informed decision regarding affected devices.
> Thanks,

As my above reply with the five-device scenario. It still needs to get
group_id to check implicit ownership in the case of sharing the same
iommu_group.

Regards,
Yi Liu


More information about the Intel-gfx mailing list