[Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

Jason Gunthorpe jgg at nvidia.com
Wed Apr 12 20:06:37 UTC 2023


On Wed, Apr 12, 2023 at 10:50:45AM -0600, Alex Williamson wrote:

> > You keep going back to this, but I maintain userspace doesn't
> > care. qemu is given a list of VFIO devices to use, all it wants to
> > know is if it is allowed to use reset or not. Why should it need to
> > know groups and group_ids to get that binary signal out of the kernel?
> 
> hw/vfio/pci.c:2320
>         error_report("vfio: Cannot reset device %s, "
>                      "depends on group %d which is not owned.",
>                      vdev->vbasedev.name, devices[i].group_id);
> 
> That creates a feedback loop where a user can take corrective action
> with actual information in hand to resolve the issue.

Which is why I listed debugging as requirement #4, and solve
requirement #4 by using the existing INFO and printing the BDF list it
returns.

> > The kicker is we don't force the user to generate a de-duplicated list
> > of devices FDs, one per group, just because.
> 
> So on one hand you're asking for simplicity, but on the other you're
> criticizing a trivial simplification that we chose to allow the user to
> pass number of group fds equal to number of devices affected so that
> the user doesn't need to take that step to de-duplicate the list.  We
> can't win.

It is not a simplification because the kernel is wired to accept only
a list of exactly that group length, no more no less. It turns into a
pointless puzzle that userspace has to solve, and it can only solve it
by knowing about groups.

If we get rid of groups we have to do something about this so
userspace doesn't need to do the calculation. That is the point of
this change.

> > > You've said that hot-reset works if the iommufd_ctx has
> > > representation from each affected group, the INFO ioctl remains as
> > > it is, which suggests that it's reporting group ID and BDF, yet only
> > > sysfs tells the user the relation between a vfio cdev and a group
> > > and we're trying to enable a pass-by-fd model for cdev where the
> > > user has no reference to a sysfs node for the device.  Show me how
> > > these pieces fit together.  
> > 
> > I prefer the version where INFO2 returns the dev_id, but info can work
> > if we do the BDF cap like you suggested to Yi
> 
> As discussed ad nauseam, dev-id is useless if an affected device is not
> already within the iommufd ctx.  

The purpose of INFO2 is to satisfy requirement #3 - which is to report
the effected devices *that are already opened*. For this dev_id is
fine. There is nothing qemu can do with devices that are outside its
iommufdctx, so it is pointless to tell it about them. It will generate
the debug print of #4 using INFO. I don't think we don't need one API here.

> > I see this problem as a few basic requirements from a qemu-like
> > application:
> > 
> >  1) Does the configuration I was given support reset right now?
> >  2) Will the configuration I was given support reset for the duration
> >     of my execution?
> >  3) What groups of the devices I already have open does the reset
> >     effect?
> >  4) For debugging, report to the user the full list of devices in the
> >     reset group, in a way that relates back to sysfs.
> >  5) Away to trigger a reset on a group of devices
> > 
> > #1/#2 is the API I suggested here. Ask the kernel if the current
> > configuration works, and ask it to keep it working.
> 
> That is super sketchy because you're also advocating for
> opportunistically supporting reset if the instantaneous conditions
> allow is (ex. unopened devices), and going back and forth whether "ask
> it to keep working" suggests that a user is able to extend their
> granted ownership themselves.  I think both needs to be based on some
> form of granted, not requested, ownership and not opportunism.

Ok, lets give up on ownership then

> > #3 is either INFO and a CAP for BDF or INFO2 reporting dev_id
> 
> Where dev-id is useful for... ?  I think there's a misuse of "groups"
> in 3) above, userspace needs to know specific devices affected, thus
> BDF.

I did not mean "group of devices" to mean iommu_group, I mean "the set of
devices affected by the reset"

> > #4 is either INFO and print the BDFs or INFO2 reporting the struct
> > vfio_device IDR # (eg /sys/class/vfio/vfioXXX/).
> 
> We can't assume that all the affected devices are bound to vfio,
> therefore we cannot assume a vfio_device IDR exists.

So BDF is better for the debugging print.

> > #5 is adjusting the FD list in existing RESET ioctl. Remove the need
> > for userspace to specify a minimal exact list of FDs means userspace
> > doesn't need the information to figure out what that list actually
> > is. Pass a 0 length list and use iommufdctx.
> 
> "...doesn't need the information to figure out what the list actually
> is."  That's false, userspace needs the information whether it uses it
> to make a list or not,

#3 is the need of affected devices, it is already covered.

I mean that #5 should not need this, #5 is only about triggering the
reset.

What I want is a #5 action that does not require doing a calcuation on
group IDs.

At the core, without any notion of groups, #5 requires userspace to
pass in every opened device FD and kernel checks that every opened
device is in the passed FD list. Close devices are ignored. Devices
with unattached drivers are ignored.

#5 does not need the answer to requirement #2.

> So we need one or more ioctls that a) indicates whether
> the ownership requirements are met 

If we reject the ownership direction, then I go back to suggesting
that INFO2 should do this.

> b) indicates the set of affected
> devices.  

INFO2 will return the dev_id which is sufficient to satisfy
requirement #3

> Is b) only the set of affected devices within the calling
> devices iommufd_ctx (ie. dev-ids),

I vote yes

> in which case we need c) a way to
> report the overall set of affected devices regardless of ownership in
> support of 4), BDF?

Yes, continue to use INFO unmodified.
 
> Are we back to replacing group-ids with dev-ids in the INFO structure,
> where an invalid dev-id either indicates an affected device with
> implied ownership (ok) or a gap in ownership (bad) and a flag somewhere
> is meant to indicate the overall disposition based on the availability
> of reset?  

As you explore in the following this gets ugly. I prefer to keep INFO
unchanged and add INFO2.

So maybe we should make patches that look something like this, try to
come up with a workable INFO2 and squeeze no-iommu into it somehow.

Jason


More information about the Intel-gfx mailing list