[Intel-gfx] [PATCH V2 5/8] mdev: introduce device specific ops

Michael S. Tsirkin mst at redhat.com
Thu Sep 26 16:34:24 UTC 2019


On Thu, Sep 26, 2019 at 10:26:08AM -0600, Alex Williamson wrote:
> On Thu, 26 Sep 2019 11:46:55 -0400
> "Michael S. Tsirkin" <mst at redhat.com> wrote:
> 
> > On Wed, Sep 25, 2019 at 10:30:28AM -0600, Alex Williamson wrote:
> > > On Wed, 25 Sep 2019 10:11:00 -0400
> > > Rob Miller <rob.miller at broadcom.com> wrote:  
> > > > > > On Tue, 24 Sep 2019 21:53:29 +0800
> > > > > > Jason Wang <jasowang at redhat.com> wrote:  
> > > > > > > diff --git a/drivers/vfio/mdev/vfio_mdev.c    
> > > > > > b/drivers/vfio/mdev/vfio_mdev.c    
> > > > > > > index 891cf83a2d9a..95efa054442f 100644
> > > > > > > --- a/drivers/vfio/mdev/vfio_mdev.c
> > > > > > > +++ b/drivers/vfio/mdev/vfio_mdev.c
> > > > > > > @@ -14,6 +14,7 @@
> > > > > > >  #include <linux/slab.h>
> > > > > > >  #include <linux/vfio.h>
> > > > > > >  #include <linux/mdev.h>
> > > > > > > +#include <linux/vfio_mdev.h>
> > > > > > >
> > > > > > >  #include "mdev_private.h"
> > > > > > >
> > > > > > > @@ -24,16 +25,16 @@
> > > > > > >  static int vfio_mdev_open(void *device_data)
> > > > > > >  {
> > > > > > >     struct mdev_device *mdev = device_data;
> > > > > > > -   struct mdev_parent *parent = mdev->parent;
> > > > > > > +   const struct vfio_mdev_device_ops *ops =    
> > > > > > mdev_get_dev_ops(mdev);    
> > > > > > >     int ret;
> > > > > > >
> > > > > > > -   if (unlikely(!parent->ops->open))
> > > > > > > +   if (unlikely(!ops->open))
> > > > > > >             return -EINVAL;
> > > > > > >
> > > > > > >     if (!try_module_get(THIS_MODULE))
> > > > > > >             return -ENODEV;    
> > > > >    
> > > >   
> > > > RJM>] My understanding lately is that this call to    
> > > > try_module_get(THIS_MODULE) is no longer needed as is considered as a
> > > > latent bug.
> > > > Quote from
> > > > https://stackoverflow.com/questions/1741415/linux-kernel-modules-when-to-use-try-module-get-module-put
> > > >  :
> > > > There are a number of uses of try_module_get(THIS_MODULE) in the kernel
> > > > source but most if not all of them are latent bugs that should be cleaned
> > > > up.  
> > > 
> > > This use seems to fall exactly into the case where it is necessary, the
> > > open here is not a direct VFS call, it's an internal interface between
> > > modules.  The user is interacting with filesystem objects from the vfio
> > > module and the module reference we're trying to acquire here is to the
> > > vfio-mdev module.  Thanks,
> > > 
> > > Alex  
> > 
> > 
> > I think the latent bug refers not to module get per se,
> > but to the module_put tied to it. E.g.:
> > 
> >  static void vfio_mdev_release(void *device_data)
> >  {
> >         struct mdev_device *mdev = device_data;
> >         struct mdev_parent *parent = mdev->parent;
> > 
> >         if (likely(parent->ops->release))
> >                 parent->ops->release(mdev);
> > 
> >         module_put(THIS_MODULE);
> > 
> > Does anything prevent the module from unloading at this point?
> > if not then ...
> > 
> > 
> >  }
> > 
> > it looks like the implicit return (with instructions for argument pop
> > and functuon return) here can get overwritten on module
> > unload, causing a crash when executed.
> > 
> > IOW there's generally no way for module to keep a reference
> > to itself: it can take a reference but it needs someone else
> > to keep it and put.
> 
> I'd always assumed this would exit cleanly, but perhaps there is a
> latent race there.  In any case, taking a module reference within the
> module in this case is better than not doing so, as the latter would
> potentially allow the module to be removed at any point in time, while
> the former only seems to expose acquire and release gaps.  Add it to
> the todo list.  Thanks,
> 
> Alex

Right. I agree with the stack overflow quote: as this example seems to show
this is a latent bug.

But I also agree that just removing the reference isn't the right way
to clean it up.

-- 
MST


More information about the Intel-gfx mailing list