[Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

Liu, Yi L yi.l.liu at intel.com
Sun Apr 23 14:46:57 UTC 2023


> From: Jason Gunthorpe <jgg at nvidia.com>
> Sent: Saturday, April 22, 2023 6:36 AM
> 
> On Thu, Apr 20, 2023 at 08:08:39AM -0600, Alex Williamson wrote:
> 
> > > Hide this device in the list looks fine to me. But the calling user should
> > > not do any new device open before finishing hot-reset. Otherwise, user may
> > > miss a device that needs to do pre/post reset. I think this requirement is
> > > acceptable. Is it?
> >
> > I think Kevin and Jason are leaning towards reporting the entire
> > dev-set.  The INFO ioctl has always been a point-in-time reading, no
> > guarantees are made if the host or user configuration is changed.
> > Nothing changes in that respect.
> 
> Yeah, I think your point about qemu community formus suggest we should
> err toward having qemu provide some fully detailed debug report.
> 
> > > > Whereas dev-id < 0
> > > > (== -1) is an affected device which prevents hot-reset, ex. an un-owned
> > > > device, device configured within a different iommufd_ctx, or device
> > > > opened outside of the vfio cdev API."  Is that about right?  Thanks,
> > >
> > > Do you mean to have separate err-code for the three possibilities? As
> > > the devid is generated by iommufd and it is u32. I'm not sure if we can
> > > have such err-code definition without reserving some ids in iommufd.
> >
> > Yes, if we're going to report the full dev-set, I think we need at
> > least two unique error codes or else the user has no way to determine
> > the subset of invalid dev-ids which block the reset.
> 
> If you think this is important to report we should report 0 and -1,
> and adjust the iommufd xarray allocator to reserve -1

Then the alloc range should be from 1 to 0xffffffff.
 
> 
> It depends what you want to show for the debugging.
> 
> eg if we have debugging where qemu dumps this table:
> 
>    BDF   In VM   iommu_group   Has VFIO driver   Has Kernel Driver
> 
> By also doing various sysfs probes based on the BDF, then the admin
> action to remedy the situation is:
> 
> Make "Has VFIO driver = y" or "Has Kernel Driver = n" for every row in
> the table to make the reset work.
> 
> And we don't need the distinction. Adding the 0/-1 lets you make a
> useful table without doing any sysfs work.
>
> > I think Jason is proposing the set of valid dev-ids are >0, a dev-id
> > of zero indicates some form of non-blocking, while <0 (or maybe
> > specifically -1) indicates a blocking device.
> 
> Yes, 0 and -1 would be fine with those definitions. The only use of
> the data is to add a 'blocking use of reset' colum to the table
> above..

Should -1 and 0 be defined in uapi as well? If yes, this seems not easy
to get a proper naming for them. Or just document it in vfio
uapi header to say -1 (blocking) and 0 (no-devid-but-not-blocking)
blabla.

Regards,
Yi Liu


More information about the Intel-gfx mailing list