[Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
Liu, Yi L
yi.l.liu at intel.com
Fri Apr 14 11:38:24 UTC 2023
> From: Tian, Kevin <kevin.tian at intel.com>
> Sent: Friday, April 14, 2023 5:12 PM
>
> > From: Alex Williamson <alex.williamson at redhat.com>
> > Sent: Friday, April 14, 2023 2:07 AM
> >
> > We had already iterated a proposal where the group-id is replaced with
> > a dev-id in the existing ioctl and a flag indicates when the return
> > value is a dev-id vs group-id. This had a gap that userspace cannot
> > determine if a reset is available given this information since un-owned
> > devices report an invalid dev-id and userspace can't know if it has
> > implicit ownership.
>
> >
> > It seems cleaner to me though that we would could still re-use INFO in
> > a similar way, simply defining a new flag bit which is valid only in
> > the case of returning dev-ids and indicates if the reset is available.
> > Therefore in one ioctl, userspace knows if hot-reset is available
> > (based on a kernel determination) and can pull valid dev-ids from the
Need to confirm the meaning of hot-reset available flag. I think it
should at least meet below two conditions to set this flag. Although
it may not mean hot-reset is for sure to succeed. (but should be
a high chance).
1) dev_set is resettable (all affected device are in dev_set)
2) affected device are owned by the current user
Also, we need to has assumption that below two cases are rare
if user encounters it, it just bad luck for them. I think the existing
_INFO and hot-reset already has such assumption. So cdev mode
can adopt it as well.
a) physical topology change (e.g. new devices plugged to affected slot)
b) an affected device is unbound from vfio
> So the kernel needs to compare the group id between devices with
> valid dev-ids and devices with invalid dev-ids to decide the implicit
> ownership. For noiommu device which has no group_id when
> VFIO_GROUP is off then it's resettable only if having a valid dev_id.
In cdev mode, noiommu device doesn't have dev_id as it is not
bound to valid iommufd. So if VFIO_GROUP is off, we may never
allow hot-reset for noiommu devices. But we don't want to have
regression with noiommu devices. Perhaps we may define the usage
of the resettable flag like this:
1) if it is set, user does not need to own all the affected devices as
some of them may have been owned implicitly. Kernel should have
checked it.
2) if the flag is not set, that means user needs to check ownership
by itself. It needs to own all the affected devices. If not, don't
do hot-reset.
This way we can still make noiommu devices support hot-reset
just like VFIO_GROUP is on. Because noiommu devices have fake
groups, such groups are all singleton. So checking all affected
devices are opened by user is just same as check all affected
groups.
> The only corner case with this option is when a user mixes group
> and cdev usages. iirc you mentioned it's a valid usage to be supported.
> In that case the kernel doesn't have sufficient knowledge to judge
> 'resettable' as it doesn't know which groups are opened by this user.
>
> Not sure whether we can leave it in a ugly way so INFO may not tell
> 'resettable' accurately in that weird scenario.
This seems not easy to support. If above scenario is allowed there can be
three cases that returns invalid dev_id.
1) devices not opened by user but owned implicitly
2) devices not owned by user
3) devices opened via group but owned by user
User would require more info to tell the above cases from each other.
> > array to associate affected, owned devices, and still has the
> > equivalent information to know that one or more of the devices listed
> > with an invalid dev-id are preventing the hot-reset from being
> > available.
> >
> > Is that an option? Thanks,
> >
>
> This works for me if above corner case can be waived.
One side check, perhaps already confirmed in prior email. @Alex, So
the reason for the prediction of hot-reset is to avoid the possible
vfio_pci_pre_reset() which does heavy operations like stop DMA and
copy config space. Is it? Any other special reason? Anyhow, this reason
is enough for this prediction per my understanding.
Regards,
Yi Liu
More information about the Intel-gfx
mailing list