[Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
Jason Gunthorpe
jgg at nvidia.com
Mon Apr 17 19:31:56 UTC 2023
On Mon, Apr 17, 2023 at 01:01:40PM -0600, Alex Williamson wrote:
> Yes, it's not trivial, but Jason is now proposing that we consider
> mixing groups, cdevs, and multiple iommufd_ctxs as invalid. I think
> this means that regardless of which device calls INFO, there's only one
> answer (assuming same set of devices opened, all cdev, all within same
> iommufd_ctx). Based on what I explained about my understanding of INFO2
> and Jason agreed to, I think the output would be:
>
> flags: NOT_RESETABLE | DEV_ID
> {
> { valid devA-id, devA-BDF },
> { valid devC-id, devC-BDF },
> { valid devD-id, devD-BDF },
> { invalid dev-id, devE-BDF },
> }
>
> Here devB gets dropped because the kernel understands that devB is
> unopened, affected, and owned. It's therefore not a blocker for
> hot-reset.
I don't think we want to drop anything because it makes the API
ill suited for the debugging purpose.
devb should be returned with an invalid dev_id if I understand your
example. Maybe it should return with -1 as the dev_id instead of 0, to
make the debugging a bit better.
Userspace should look at only NOT_RESETTABLE to determine if it
proceeds or not, and it should use the valid dev_id list to iterate
over the devices it has open to do the config stuff.
> OTOH, devE is unopened, affected, and un-owned, and we
> previously agreed against the opportunistic un-opened/un-owned loophole.
NOT_RESETABLE should be returned in this case, yes.
If we want to enable userspace to use the loophole it should be an
additional flag. RESETABLE_FOR_NOW or something
> I think we're narrowing in on an interface that isn't as arbitrary. If
> we assume the restrictions that Jason proposes, then cdev is exclusively
> a kernel determined reset availability model
Yes, I think this is probably best looking forward.
> where I'd agree that
> passing device-fds as a proof of ownership is pointless. The group
> interface would therefore remain exclusively a proof-of-ownership
> model since we have no incentive to extend it to kernel-determined
> given the limited use case of all affected devices managed by the same
> vfio container.
Yes
> Moot, but there's actually enough information there to infer IOMMU
> groups for each device, but we probably can't prove that would always
> be the case. If we adopt Jason's proposal though, I don't see that we
> need either a group-id or BDF capability, the BDF is only for debug
> reporting. However, there is a new burden on the kernel to identify
> the affected, un-owned devices for that report.
Yes and yes
Jason
More information about the Intel-gfx
mailing list