[Intel-gfx] [PATCH V2 5/8] mdev: introduce device specific ops
Michael S. Tsirkin
mst at redhat.com
Thu Sep 26 16:34:24 UTC 2019
On Thu, Sep 26, 2019 at 10:26:08AM -0600, Alex Williamson wrote:
> On Thu, 26 Sep 2019 11:46:55 -0400
> "Michael S. Tsirkin" <mst at redhat.com> wrote:
>
> > On Wed, Sep 25, 2019 at 10:30:28AM -0600, Alex Williamson wrote:
> > > On Wed, 25 Sep 2019 10:11:00 -0400
> > > Rob Miller <rob.miller at broadcom.com> wrote:
> > > > > > On Tue, 24 Sep 2019 21:53:29 +0800
> > > > > > Jason Wang <jasowang at redhat.com> wrote:
> > > > > > > diff --git a/drivers/vfio/mdev/vfio_mdev.c
> > > > > > b/drivers/vfio/mdev/vfio_mdev.c
> > > > > > > index 891cf83a2d9a..95efa054442f 100644
> > > > > > > --- a/drivers/vfio/mdev/vfio_mdev.c
> > > > > > > +++ b/drivers/vfio/mdev/vfio_mdev.c
> > > > > > > @@ -14,6 +14,7 @@
> > > > > > > #include <linux/slab.h>
> > > > > > > #include <linux/vfio.h>
> > > > > > > #include <linux/mdev.h>
> > > > > > > +#include <linux/vfio_mdev.h>
> > > > > > >
> > > > > > > #include "mdev_private.h"
> > > > > > >
> > > > > > > @@ -24,16 +25,16 @@
> > > > > > > static int vfio_mdev_open(void *device_data)
> > > > > > > {
> > > > > > > struct mdev_device *mdev = device_data;
> > > > > > > - struct mdev_parent *parent = mdev->parent;
> > > > > > > + const struct vfio_mdev_device_ops *ops =
> > > > > > mdev_get_dev_ops(mdev);
> > > > > > > int ret;
> > > > > > >
> > > > > > > - if (unlikely(!parent->ops->open))
> > > > > > > + if (unlikely(!ops->open))
> > > > > > > return -EINVAL;
> > > > > > >
> > > > > > > if (!try_module_get(THIS_MODULE))
> > > > > > > return -ENODEV;
> > > > >
> > > >
> > > > RJM>] My understanding lately is that this call to
> > > > try_module_get(THIS_MODULE) is no longer needed as is considered as a
> > > > latent bug.
> > > > Quote from
> > > > https://stackoverflow.com/questions/1741415/linux-kernel-modules-when-to-use-try-module-get-module-put
> > > > :
> > > > There are a number of uses of try_module_get(THIS_MODULE) in the kernel
> > > > source but most if not all of them are latent bugs that should be cleaned
> > > > up.
> > >
> > > This use seems to fall exactly into the case where it is necessary, the
> > > open here is not a direct VFS call, it's an internal interface between
> > > modules. The user is interacting with filesystem objects from the vfio
> > > module and the module reference we're trying to acquire here is to the
> > > vfio-mdev module. Thanks,
> > >
> > > Alex
> >
> >
> > I think the latent bug refers not to module get per se,
> > but to the module_put tied to it. E.g.:
> >
> > static void vfio_mdev_release(void *device_data)
> > {
> > struct mdev_device *mdev = device_data;
> > struct mdev_parent *parent = mdev->parent;
> >
> > if (likely(parent->ops->release))
> > parent->ops->release(mdev);
> >
> > module_put(THIS_MODULE);
> >
> > Does anything prevent the module from unloading at this point?
> > if not then ...
> >
> >
> > }
> >
> > it looks like the implicit return (with instructions for argument pop
> > and functuon return) here can get overwritten on module
> > unload, causing a crash when executed.
> >
> > IOW there's generally no way for module to keep a reference
> > to itself: it can take a reference but it needs someone else
> > to keep it and put.
>
> I'd always assumed this would exit cleanly, but perhaps there is a
> latent race there. In any case, taking a module reference within the
> module in this case is better than not doing so, as the latter would
> potentially allow the module to be removed at any point in time, while
> the former only seems to expose acquire and release gaps. Add it to
> the todo list. Thanks,
>
> Alex
Right. I agree with the stack overflow quote: as this example seems to show
this is a latent bug.
But I also agree that just removing the reference isn't the right way
to clean it up.
--
MST
More information about the Intel-gfx
mailing list