[Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
Tian, Kevin
kevin.tian at intel.com
Wed Apr 12 07:14:30 UTC 2023
> From: Alex Williamson <alex.williamson at redhat.com>
> Sent: Wednesday, April 12, 2023 5:58 AM
>
> On Tue, 11 Apr 2023 15:40:07 -0300
> Jason Gunthorpe <jgg at nvidia.com> wrote:
>
> > On Tue, Apr 11, 2023 at 11:11:17AM -0600, Alex Williamson wrote:
> > > [Appears the list got dropped, replying to my previous message to re-add]
> >
> > Wowo this got mesed up alot, mutt drops the cc when replying for some
> > reason. I think it is fixed up now
> >
> > > > Our cdev model says that opening a cdev locks out other cdevs from
> > > > independent use, eg because of the group sharing. Extending this to
> > > > include the reset group as well seems consistent.
> > >
> > > The DMA ownership model based on the IOMMU group is consistent with
> > > legacy vfio, but now you're proposing a new ownership model that
> > > optionally allows a user to extend their ownership, opportunistically
> > > lock out other users, and wreaking havoc for management utilities that
> > > also have no insight into dev_sets or userspace driver behavior.
> >
> > I suggested below that the owership require enough open devices - so
> > it doesn't "extend ownership opportunistically", and there is no
> > havoc.
> >
> > Management tools already need to understand dev_set if they want to
> > offer reliable reset support to the VMs. Same as today.
>
> I don't think that's true. Our primary hot-reset use case is GPUs and
> subordinate functions, where the isolation and reset scope are often
> sufficiently similar to make hot-reset possible, regardless whether
> all the functions are assigned to a VM. I don't think you'll find any
> management tools that takes reset scope into account otherwise.
If we only care about the primary case where iommu group and reset
scope matches, then why would the new claim model in Jason's proposal
urge the management tools to understand the reset scope now?
btw in your earlier replies you pointed out the issue of unpredictable
ordering on a multi-function device e.g. upon which one runs first
dpdk or qmeu will block the other. But I wonder what is the actual use
of allowing both running while both can't do reset due to affected reset
scope in current model.
If a vfio user cannot do reset doesn't it imply it hasn't acquired the full
permission on the device then Jason's proposal of explicitly failing it
is actually a cleaner model?
Thanks
Kevin
More information about the Intel-gfx
mailing list