[Intel-gfx] [PATCH v4] vfio: fix potential deadlock on vfio group lock
Alex Williamson
alex.williamson at redhat.com
Thu Jan 19 19:05:13 UTC 2023
On Thu, 19 Jan 2023 03:43:36 +0000
"Tian, Kevin" <kevin.tian at intel.com> wrote:
> > From: Matthew Rosato <mjrosato at linux.ibm.com>
> > Sent: Wednesday, January 18, 2023 10:56 PM
> >
> > On 1/18/23 4:03 AM, Tian, Kevin wrote:
> > >> From: Alex Williamson
> > >> Sent: Wednesday, January 18, 2023 5:23 AM
> > >>
> > >> On Fri, 13 Jan 2023 19:03:51 -0500
> > >> Matthew Rosato <mjrosato at linux.ibm.com> wrote:
> > >>
> > >>> void vfio_device_group_close(struct vfio_device *device)
> > >>> {
> > >>> + void (*put_kvm)(struct kvm *kvm);
> > >>> + struct kvm *kvm;
> > >>> +
> > >>> mutex_lock(&device->group->group_lock);
> > >>> + kvm = device->kvm;
> > >>> + put_kvm = device->put_kvm;
> > >>> vfio_device_close(device, device->group->iommufd);
> > >>> + if (kvm == device->kvm)
> > >>> + kvm = NULL;
> > >>
> > >> Hmm, so we're using whether the device->kvm pointer gets cleared in
> > >> last_close to detect whether we should put the kvm reference. That's a
> > >> bit obscure. Our get and put is also asymmetric.
> > >>
> > >> Did we decide that we couldn't do this via a schedule_work() from the
> > >> last_close function, ie. implementing our own version of an async put?
> > >> It seems like that potentially has a cleaner implementation, symmetric
> > >> call points, handling all the storing and clearing of kvm related
> > >> pointers within the get/put wrappers, passing only a vfio_device to the
> > >> put wrapper, using the "vfio_device_" prefix for both. Potentially
> > >> we'd just want an unconditional flush outside of lock here for
> > >> deterministic release.
> > >>
> > >> What's the downside? Thanks,
> > >>
> > >
> > > btw I guess this can be also fixed by Yi's work here:
> > >
> > > https://lore.kernel.org/kvm/20230117134942.101112-6-yi.l.liu@intel.com/
> > >
> > > with set_kvm(NULL) moved to the release callback of kvm_vfio device,
> > > such circular lock dependency can be avoided too.
> >
> > Oh, interesting... It seems to me that this would eliminate the reported call
> > chain altogether:
> >
> > kvm_put_kvm
> > -> kvm_destroy_vm
> > -> kvm_destroy_devices
> > -> kvm_vfio_destroy (starting here -- this would no longer be executed)
> > -> kvm_vfio_file_set_kvm
> > -> vfio_file_set_kvm
> > -> group->group_lock/group_rwsem
> >
> > because kvm_destroy_devices now can't end up calling kvm_vfio_destroy
> > and friends, it won't try and acquire the group lock a 2nd time making a
> > kvm_put_kvm while the group lock is held OK to do. The vfio_file_set_kvm
> > call will now always come from a separate thread of execution,
> > kvm_vfio_group_add, kvm_vfio_group_del or the release thread:
> >
> > kvm_device_release (where the group->group_lock would not be held since
> > vfio does not trigger closing of the kvm fd)
> > -> kvm_vfio_destroy (or, kvm_vfio_release)
> > -> kvm_vfio_file_set_kvm
> > -> vfio_file_set_kvm
> > -> group->group_lock/group_rwsem
>
> Yes, that's my point. If Alex/Jason are also OK with it probably Yi can
> send that patch separately as a fix to this issue. It's much simpler. 😊
If we can extract that flow separate from the cdev refactoring, ideally
something that matches the stable kernel backport rules, then that
sounds like the preferred solution. Thanks,
Alex
More information about the Intel-gfx
mailing list