[Intel-gfx] [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset

Alex Williamson alex.williamson at redhat.com
Sat Apr 1 13:08:14 UTC 2023


On Sat, 1 Apr 2023 09:15:33 +0000
"Xu, Terrence" <terrence.xu at intel.com> wrote:

> > -----Original Message-----
> > From: intel-gvt-dev <intel-gvt-dev-bounces at lists.freedesktop.org> On
> > Behalf Of Alex Williamson
> > Sent: Saturday, April 1, 2023 1:50 AM
> > 
> > On Fri, 31 Mar 2023 17:27:27 +0000
> > "Xu, Terrence" <terrence.xu at intel.com> wrote:
> >   
> > > > -----Original Message-----
> > > > From: Liu, Yi L <yi.l.liu at intel.com>
> > > > Sent: Monday, March 27, 2023 5:35 PM
> > > >
> > > > VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group
> > > > fds to prove that it owns all devices affected by resetting the
> > > > calling device. This series introduces several extensions to allow
> > > > the ownership check better aligned with iommufd and coming vfio device  
> > cdev support.  
> > > >
> > > > First, resetting an unopened device is always safe given nobody is
> > > > using it. So relax the check to allow such devices not covered by
> > > > group fd array. [1]
> > > >
> > > > When iommufd is used we can simply verify that all affected devices
> > > > are bound to a same iommufd then no need for the user to provide
> > > > extra fd information. This is enabled by the user passing a
> > > > zero-length fd array and moving forward this should be the preferred
> > > > way for hot reset. [2]
> > > >
> > > > However the iommufd method has difficulty working with noiommu
> > > > devices since those devices don't have a valid iommufd, unless the
> > > > noiommu device is in a singleton dev_set hence no ownership check is
> > > > required. [3]
> > > >
> > > > For noiommu backward compatibility a 3rd method is introduced by
> > > > allowing the user to pass an array of device fds to prove ownership.
> > > > [4]
> > > >
> > > > As suggested by Jason [5], we have this series to introduce the
> > > > above stuffs to the vfio PCI hot reset. Per the dicussion in [6],
> > > > this series also adds a new _INFO ioctl to get hot reset scope for given  
> > device.  
> > > >
> > > > [1] https://lore.kernel.org/kvm/Y%2FdobS6gdSkxnPH7@nvidia.com/
> > > > [2] https://lore.kernel.org/kvm/Y%2FZOOClu8nXy2toX@nvidia.com/#t
> > > > [3] https://lore.kernel.org/kvm/ZACX+Np%2FIY7ygqL5@nvidia.com/
> > > > [4]
> > > >  
> > https://lore.kernel.org/kvm/DS0PR11MB7529BE88460582BD599DC1F7C3B19  
> > > > @DS0PR11MB7529.namprd11.prod.outlook.com/#t
> > > > [5] https://lore.kernel.org/kvm/ZAcvzvhkt9QhCmdi@nvidia.com/
> > > > [6] https://lore.kernel.org/kvm/ZBoYgNq60eDpV9Un@nvidia.com/
> > > >
> > > > Change log:
> > > >
> > > > v2:
> > > >  - Split the patch 03 of v1 to be 03, 04 and 05 of v2 (Jaon)
> > > >  - Add r-b from Kevin and Jason
> > > >  - Add patch 10 to introduce a new _INFO ioctl for the usage of device
> > > >    fd passing usage in cdev path (Jason, Alex)
> > > >
> > > > v1:
> > > > https://lore.kernel.org/kvm/20230316124156.12064-1-yi.l.liu@intel.co
> > > > m/
> > > >
> > > > Regards,
> > > > 	Yi Liu
> > > >
> > > > Yi Liu (10):
> > > >   vfio/pci: Update comment around group_fd get in
> > > >     vfio_pci_ioctl_pci_hot_reset()
> > > >   vfio/pci: Only check ownership of opened devices in hot reset
> > > >   vfio/pci: Move the existing hot reset logic to be a helper
> > > >   vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for
> > > >     vfio_device
> > > >   vfio/pci: Allow passing zero-length fd array in
> > > >     VFIO_DEVICE_PCI_HOT_RESET
> > > >   vfio: Refine vfio file kAPIs for vfio PCI hot reset
> > > >   vfio: Accpet device file from vfio PCI hot reset path
> > > >   vfio/pci: Renaming for accepting device fd in hot reset path
> > > >   vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl
> > > >   vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
> > > >
> > > >  drivers/iommu/iommufd/device.c   |  12 ++
> > > >  drivers/vfio/group.c             |  32 ++--
> > > >  drivers/vfio/iommufd.c           |  16 ++
> > > >  drivers/vfio/pci/vfio_pci_core.c | 244 ++++++++++++++++++++++++----  
> > ---  
> > > >  drivers/vfio/vfio.h              |   2 +
> > > >  drivers/vfio/vfio_main.c         |  44 ++++++
> > > >  include/linux/iommufd.h          |   3 +
> > > >  include/linux/vfio.h             |  14 ++
> > > >  include/uapi/linux/vfio.h        |  65 +++++++-
> > > >  9 files changed, 364 insertions(+), 68 deletions(-)
> > > >
> > > > --
> > > > 2.34.1  
> > >
> > > Verified this series by "Intel GVT-g GPU device mediated passthrough".
> > > Passed VFIO legacy mode / compat mode / cdev mode basic functionality  
> > and GPU force reset test.  
> > >
> > > Tested-by: Terrence Xu <terrence.xu at intel.com>  
> > 
> > Seems like only this "GPU force reset test" is relevant to the new
> > functionality of this series, GVT-g does not and has no reason to support the
> > HOT_RESET ioctls used here.  Can you provide more details of the force-reset
> > test?  What userspace driver is being used?  Thanks,
> > 
> > Alex  
> Hi Alex, about the "GPU force reset test", I used the "i915_hangman"
> test from intel-gpu-tools, it is for GPU force hang / reset. It is an
> important regression test scenario for this patch series. To test the
> HOT_RESET ioctls itself, need to wait the corresponding Qemu changes
> from Yi.

But i915 exists on the host root bus, we fundamentally cannot perform a
bus reset of the root bus.  So how exactly is testing with GVT-g, which
doesn't use the vfio-pci-core hot-reset ioctl, or GVT-d, which can't do
a bus reset because it exists on the root bus, relevant to this series?
Is this some novel use of a dGPU i915 with out-of-tree drivers?

Obviously any regression testing is fine and appreciated, but if this
is intended to express some validation of the new interface, I'm
failing to see how.  Thanks,

Alex



More information about the Intel-gfx mailing list