[Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
Alex Williamson
alex.williamson at redhat.com
Sun Apr 9 13:29:51 UTC 2023
On Sun, 9 Apr 2023 19:58:47 +0800
Yi Liu <yi.l.liu at intel.com> wrote:
> On 2023/4/8 22:20, Alex Williamson wrote:
> > On Sat, 8 Apr 2023 05:07:16 +0000
> > "Liu, Yi L" <yi.l.liu at intel.com> wrote:
> >
> >>> From: Alex Williamson <alex.williamson at redhat.com>
> >>> Sent: Saturday, April 8, 2023 5:07 AM
> >>>
> >>> On Fri, 7 Apr 2023 15:47:10 +0000
> >>> "Liu, Yi L" <yi.l.liu at intel.com> wrote:
> >>>
> >>>>> From: Alex Williamson <alex.williamson at redhat.com>
> >>>>> Sent: Friday, April 7, 2023 11:14 PM
> >>>>>
> >>>>> On Fri, 7 Apr 2023 14:04:02 +0000
> >>>>> "Liu, Yi L" <yi.l.liu at intel.com> wrote:
> >>>>>
> >>>>>>> From: Alex Williamson <alex.williamson at redhat.com>
> >>>>>>> Sent: Friday, April 7, 2023 9:52 PM
> >>>>>>>
> >>>>>>> On Fri, 7 Apr 2023 13:24:25 +0000
> >>>>>>> "Liu, Yi L" <yi.l.liu at intel.com> wrote:
> >>>>>>>
> >>>>>>>>> From: Alex Williamson <alex.williamson at redhat.com>
> >>>>>>>>> Sent: Friday, April 7, 2023 8:04 PM
> >>>>>>>>>
> >>>>>>>>>>>>> @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev
> >>>>> *pdev,
> >>>>>>> void
> >>>>>>>>>>> *data)
> >>>>>>>>>>>>> if (!iommu_group)
> >>>>>>>>>>>>> return -EPERM; /* Cannot reset non-isolated devices
> >>> */
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi Alex,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Is disabling iommu a sane way to test vfio noiommu mode?
> >>>>>>>>>>>
> >>>>>>>>>>> Yes
> >>>>>>>>>>>
> >>>>>>>>>>>> I added intel_iommu=off to disable intel iommu and bind a device to
> >>> vfio-
> >>>>> pci.
> >>>>>>>>>>>> I can see the /dev/vfio/noiommu-0 and /dev/vfio/devices/noiommu-
> >>> vfio0.
> >>>>>>> Bind
> >>>>>>>>>>>> iommufd==-1 can succeed, but failed to get hot reset info due to the
> >>>>> above
> >>>>>>>>>>>> group check. Reason is that this happens to have some affected
> >>> devices,
> >>>>> and
> >>>>>>>>>>>> these devices have no valid iommu_group (because they are not
> >>> bound to
> >>>>>>> vfio-
> >>>>>>>>> pci
> >>>>>>>>>>>> hence nobody allocates noiommu group for them). So when hot reset
> >>> info
> >>>>>>> loops
> >>>>>>>>>>>> such devices, it failed with -EPERM. Is this expected?
> >>>>>>>>>>>
> >>>>>>>>>>> Hmm, I didn't recall that we put in such a limitation, but given the
> >>>>>>>>>>> minimally intrusive approach to no-iommu and the fact that we never
> >>>>>>>>>>> defined an invalid group ID to return to the user, it makes sense that
> >>>>>>>>>>> we just blocked the ioctl for no-iommu use. I guess we can do the same
> >>>>>>>>>>> for no-iommu cdev.
> >>>>>>>>>>
> >>>>>>>>>> I just realize a further issue related to this limitation. Remember that we
> >>>>>>>>>> may finally compile out the vfio group infrastructure in the future. Say I
> >>>>>>>>>> want to test noiommu, I may boot such a kernel with iommu disabled. I
> >>> think
> >>>>>>>>>> the _INFO ioctl would fail as there is no iommu_group. Does it mean we
> >>> will
> >>>>>>>>>> not support hot reset for noiommu in future if vfio group infrastructure is
> >>>>>>>>>> compiled out?
> >>>>>>>>>
> >>>>>>>>> We're talking about IOMMU groups, IOMMU groups are always present
> >>>>>>>>> regardless of whether we expose a vfio group interface to userspace.
> >>>>>>>>> Remember, we create IOMMU groups even in the no-iommu case. Even
> >>> with
> >>>>>>>>> pure cdev, there are underlying IOMMU groups that maintain the DMA
> >>>>>>>>> ownership.
> >>>>>>>>
> >>>>>>>> hmmm. As [1], when iommu is disabled, there will be no iommu_group for a
> >>>>>>>> given device unless it is registered to VFIO, which a fake group is created.
> >>>>>>>> That's why I hit the limitation [1]. When vfio_group is compiled out, then
> >>>>>>>> even fake group goes away.
> >>>>>>>
> >>>>>>> In the vfio group case, [1] can be hit with no-iommu only when there
> >>>>>>> are affected devices which are not bound to vfio.
> >>>>>>
> >>>>>> yes. because vfio would allocate fake group when device is registered to
> >>>>>> it.
> >>>>>>
> >>>>>>> Why are we not
> >>>>>>> allocating an IOMMU group to no-iommu devices when vfio group is
> >>>>>>> disabled? Thanks,
> >>>>>>
> >>>>>> hmmm. when the vfio group code is configured out. The
> >>>>>> vfio_device_set_group() just returns 0 after below patch is
> >>>>>> applied and CONFIG_VFIO_GROUP=n. So when there is no
> >>>>>> vfio group, the fake group also goes away.
> >>>>>>
> >>>>>> https://lore.kernel.org/kvm/20230401151833.124749-25-yi.l.liu@intel.com/
> >>>>>
> >>>>> Is this a fundamental issue or just a problem with the current
> >>>>> implementation proposal? It seems like the latter. FWIW, I also don't
> >>>>> see a taint happening in the cdev path for no-iommu use. Thanks,
> >>>>
> >>>> yes. the latter case. The reason I raised it here is to confirm the
> >>>> policy on the new group/bdf capability in the DEVICE_GET_INFO. If
> >>>> there is no iommu group, perhaps I only need to exclude the new
> >>>> group/bdf capability from the cap chain of DEVICE_GET_INFO. is it?
> >>>
> >>> I think we need to revisit the question of why allocating an IOMMU
> >>> group for a no-iommu device is exclusive to the vfio group support.
> >>
> >> For no-iommu device, the iommu group is a fake group allocated by vfio.
> >> is it? And the fake group allocation is part of the vfio group code.
> >> It is the vfio_device_set_group() in group.c. If vfio group code is not
> >> compiled in, vfio does not allocate fake groups. Detail for this compiling
> >> can be found in link [1].
> >>
> >>> We've already been down the path of trying to report a field that only
> >>> exists for devices with certain properties with dev-id. It doesn't
> >>> work well. I think we've said all along that while the cdev interface
> >>> is device based, there are still going to be underlying IOMMU groups
> >>> for the user to be aware of, they're just not as much a fundamental
> >>> part of the interface. There should not be a case where a device
> >>> doesn't have a group to report. Thanks,
> >>
> >> As the patch in link [1] makes vfio group optional, so if compile a kernel
> >> with CONFIG_VFIO_GROUP=n, and boot it with iommu disabled, then there is no
> >> group to report. Perhaps this is not a typical usage but still a sane usage
> >> for noiommu mode as I confirmed with you in this thread. So when it comes,
> >> needs to consider what to report for the group field.
> >>
> >> Perhaps I messed up the discussion by referring to a patch that is part of
> >> another series. But I think it should be considered when talking about the
> >> group to be reported.
> >
> > The question is whether the split that group.c code handles both the
> > vfio group AND creation of the IOMMU group in such cases is the correct
> > split. I'm not arguing that the way the code is currently laid out has
> > the fake IOMMU group for no-iommu devices created in vfio group
> > specific code, but we have a common interface that makes use of IOMMU
> > group information for which we don't have an equivalent alternative
> > data field to report.
>
> yes. It is needed to ensure _HOT_RESET_INFO workable for noiommu devices.
>
> > We've shown that dev-id doesn't work here because dev-ids only exist
> > for devices within the user's IOMMU context. Also reporting an invalid
> > ID of any sort fails to indicate the potential implied ownership.
> > Therefore I recognize that if this interface is to report an IOMMU
> > group, then the creation of fake IOMMU groups existing only in vfio
> > group code would need to be refactored. Thanks,
>
> yeah, needs to move the iommu group creation back to vfio_main.c. This
> would be a prerequisite for [1]
>
> [1] https://lore.kernel.org/kvm/20230401151833.124749-25-yi.l.liu@intel.com/
>
> I'll also try out your suggestion to add a capability like below and link
> it in the vfio_device_info cap chain.
>
> #define VFIO_DEVICE_INFO_CAP_PCI_BDF 5
>
> struct vfio_device_info_cap_pci_bdf {
> struct vfio_info_cap_header header;
> __u32 group_id;
> __u16 segment;
> __u8 bus;
> __u8 devfn; /* Use PCI_SLOT/PCI_FUNC */
> };
>
Group-id and bdf should be separate capabilities, all device should
report a group-id capability and only PCI devices a bdf capability.
Thanks,
Alex
More information about the Intel-gfx
mailing list