[Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

Jason Gunthorpe jgg at nvidia.com
Mon Apr 17 19:31:56 UTC 2023


On Mon, Apr 17, 2023 at 01:01:40PM -0600, Alex Williamson wrote:
> Yes, it's not trivial, but Jason is now proposing that we consider
> mixing groups, cdevs, and multiple iommufd_ctxs as invalid.  I think
> this means that regardless of which device calls INFO, there's only one
> answer (assuming same set of devices opened, all cdev, all within same
> iommufd_ctx).  Based on what I explained about my understanding of INFO2
> and Jason agreed to, I think the output would be:
> 
> flags: NOT_RESETABLE | DEV_ID
> {
>   { valid devA-id,  devA-BDF },
>   { valid devC-id,  devC-BDF },
>   { valid devD-id,  devD-BDF },
>   { invalid dev-id, devE-BDF },
> }
> 
> Here devB gets dropped because the kernel understands that devB is
> unopened, affected, and owned.  It's therefore not a blocker for
> hot-reset.

I don't think we want to drop anything because it makes the API
ill suited for the debugging purpose.

devb should be returned with an invalid dev_id if I understand your
example. Maybe it should return with -1 as the dev_id instead of 0, to
make the debugging a bit better.

Userspace should look at only NOT_RESETTABLE to determine if it
proceeds or not, and it should use the valid dev_id list to iterate
over the devices it has open to do the config stuff.

> OTOH, devE is unopened, affected, and un-owned, and we
> previously agreed against the opportunistic un-opened/un-owned loophole.

NOT_RESETABLE should be returned in this case, yes.

If we want to enable userspace to use the loophole it should be an
additional flag. RESETABLE_FOR_NOW or something

> I think we're narrowing in on an interface that isn't as arbitrary.  If
> we assume the restrictions that Jason proposes, then cdev is exclusively
> a kernel determined reset availability model

Yes, I think this is probably best looking forward.

> where I'd agree that
> passing device-fds as a proof of ownership is pointless.  The group
> interface would therefore remain exclusively a proof-of-ownership
> model since we have no incentive to extend it to kernel-determined
> given the limited use case of all affected devices managed by the same
> vfio container.

Yes

> Moot, but there's actually enough information there to infer IOMMU
> groups for each device, but we probably can't prove that would always
> be the case.  If we adopt Jason's proposal though, I don't see that we
> need either a group-id or BDF capability, the BDF is only for debug
> reporting.  However, there is a new burden on the kernel to identify
> the affected, un-owned devices for that report.  

Yes and yes

Jason


More information about the Intel-gfx mailing list