[Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
Alex Williamson
alex.williamson at redhat.com
Mon Apr 17 20:06:42 UTC 2023
On Mon, 17 Apr 2023 16:31:56 -0300
Jason Gunthorpe <jgg at nvidia.com> wrote:
> On Mon, Apr 17, 2023 at 01:01:40PM -0600, Alex Williamson wrote:
> > Yes, it's not trivial, but Jason is now proposing that we consider
> > mixing groups, cdevs, and multiple iommufd_ctxs as invalid. I think
> > this means that regardless of which device calls INFO, there's only one
> > answer (assuming same set of devices opened, all cdev, all within same
> > iommufd_ctx). Based on what I explained about my understanding of INFO2
> > and Jason agreed to, I think the output would be:
> >
> > flags: NOT_RESETABLE | DEV_ID
> > {
> > { valid devA-id, devA-BDF },
> > { valid devC-id, devC-BDF },
> > { valid devD-id, devD-BDF },
> > { invalid dev-id, devE-BDF },
> > }
> >
> > Here devB gets dropped because the kernel understands that devB is
> > unopened, affected, and owned. It's therefore not a blocker for
> > hot-reset.
>
> I don't think we want to drop anything because it makes the API
> ill suited for the debugging purpose.
>
> devb should be returned with an invalid dev_id if I understand your
> example. Maybe it should return with -1 as the dev_id instead of 0, to
> make the debugging a bit better.
>
> Userspace should look at only NOT_RESETTABLE to determine if it
> proceeds or not, and it should use the valid dev_id list to iterate
> over the devices it has open to do the config stuff.
If an affected device is owned, not opened, and not interfering with
the reset, what is it adding to the API to report it for debugging
purposes? I'm afraid this leads into expanding "invalid dev-id" into an
errno or bitmap of error conditions that the user needs to parse.
> > OTOH, devE is unopened, affected, and un-owned, and we
> > previously agreed against the opportunistic un-opened/un-owned loophole.
>
> NOT_RESETABLE should be returned in this case, yes.
>
> If we want to enable userspace to use the loophole it should be an
> additional flag. RESETABLE_FOR_NOW or something
Ugh, please no. It's always a volatile result, but a volatile result
that relies on device state outside the scope or control of the user is
not even worthwhile imo. Thanks,
Alex
More information about the Intel-gfx
mailing list