[PATCH 6/6] drm/tinydrm: Support device unplug

Noralf Trønnes noralf at tronnes.org
Thu Aug 31 19:22:03 UTC 2017


Den 31.08.2017 14.59, skrev Laurent Pinchart:
> Hi Daniel and Noralf,
>
> On Wednesday, 30 August 2017 20:18:49 EEST Daniel Vetter wrote:
>> On Wed, Aug 30, 2017 at 6:31 PM, Noralf Trønnes <noralf at tronnes.org> wrote:
>>> Den 28.08.2017 23.56, skrev Daniel Vetter:
>>>> On Mon, Aug 28, 2017 at 07:17:48PM +0200, Noralf Trønnes wrote:
>>>>> Support device unplugging to make tinydrm suitable for USB devices.
>>>>>
>>>>> Cc: David Lechner <david at lechnology.com>
>>>>> Signed-off-by: Noralf Trønnes <noralf at tronnes.org>
>>>>> ---
>>>>>
>>>>> drivers/gpu/drm/tinydrm/core/tinydrm-core.c | 69 ++++++++++++++++++---
>>>>> drivers/gpu/drm/tinydrm/mi0283qt.c          |  4 ++
>>>>> drivers/gpu/drm/tinydrm/mipi-dbi.c          |  5 ++-
>>>>> drivers/gpu/drm/tinydrm/repaper.c           |  9 +++-
>>>>> drivers/gpu/drm/tinydrm/st7586.c            |  9 +++-
>>>>> include/drm/tinydrm/tinydrm.h               |  5 +++
>>>>> 6 files changed, 87 insertions(+), 14 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/tinydrm/core/tinydrm-core.c
>>>>> b/drivers/gpu/drm/tinydrm/core/tinydrm-core.c
>>>>> index f11f4cd..3ccbcc5 100644
>>>>> --- a/drivers/gpu/drm/tinydrm/core/tinydrm-core.c
>>>>> +++ b/drivers/gpu/drm/tinydrm/core/tinydrm-core.c
>>>>> @@ -32,6 +32,29 @@
>>>>>    * The driver allocates &tinydrm_device, initializes it using
>>>>>    * devm_tinydrm_init(), sets up the pipeline using
>>>>> tinydrm_display_pipe_init()
>>>>>    * and registers the DRM device using devm_tinydrm_register().
>>>>> + *
>>>>> + * Device unplug
>>>>> + * -------------
>>>>> + *
>>>>> + * tinydrm supports device unplugging when there's still open DRM or
>>>>> fbdev file
>>>>> + * handles.
>>>>> + *
>>>>> + * There are 2 ways for driver-device unbinding to happen:
>>>>> + *
>>>>> + * - The driver module is unloaded causing the driver to be
>>>>> unregistered.
>>>>> + *   This can't happen as long as there's open file handles because a
>>>>> reference
>>>>> + *   is taken on the module.
>>>> Aside: you can do that, but then it works like a hw unplug, through the
>>>> unbind property in sysfs.
>>> The module is still pinned isn't it?
>> Yup, but only the code is pinned, not any of the datastructures like
>> drm_device.
>>
>>> I can add sysfs unbind as a third way of triggering unbind:
>>>   * - The sysfs driver _unbind_ file can be used to unbind the driver form
>>> the
>>>   *   device. This can happen any time.
>>>   *
> Sounds good to me.
>
>>>>> + *
>>>>> + * - The device is removed (USB, Device Tree overlay).
>>>>> + *   This can happen at any time.
>>>>> + *
>>>>> + * The driver needs to protect device resources from access after the
>>>>> device is
>>>>> + * gone. This is done checking drm_dev_is_unplugged(), typically in
>>>>> + * &drm_framebuffer_funcs.dirty, &drm_simple_display_pipe_funcs.enable
>>>>> and
>>>>> + * \.disable. Resources that doesn't face userspace and is only used
> s/doesn't/don't/
> s/and is/and are/
>
>>>>> with the
>>>>> + * device can be setup using devm\_ functions, but &tinydrm_device must
>>>>> be
>>>>> + * allocated using plain kzalloc() since it's lifetime can exceed that
>>>>> of the
> s/it's/its/
>
>>>>> + * device. tinydrm_release() will free the structure.
>>>> So here's a bit a dragon: There's no prevention of is_unplugged racing
>>>> against a drm_dev_unplug(). There's been various attempts to fixing this,
>>>> but they're all somewhat ugly.
>>>>
>>>> Either way, that's a bug in the drm core :-)
>>>>
>>>>>     */
>>>>>     
>>>>>      /**
>>>>> @@ -138,6 +161,29 @@ static const struct drm_mode_config_funcs
>>>>> tinydrm_mode_config_funcs = {
>>>>>          .atomic_commit = drm_atomic_helper_commit,
>>>>>    };
>>>>> +/**
>>>>> + * tinydrm_release - DRM driver release helper
>>>>> + * @drm: DRM device
>>>>> + *
>>>>> + * This function cleans up and finalizes &drm_device and frees
>>>>> &tinydrm_device.
>>>>> + *
>>>>> + * Drivers must use this as their &drm_driver->release callback.
>>>>> + */
>>>>> +void tinydrm_release(struct drm_device *drm)
>>>>> +{
>>>>> +       struct tinydrm_device *tdev = drm_to_tinydrm(drm);
>>>>> +
>>>>> +       DRM_DEBUG_DRIVER("\n");
>>>>> +
>>>>> +       drm_mode_config_cleanup(drm);
>>>>> +       drm_dev_fini(drm);
>>>>> +
>>>>> +       mutex_destroy(&tdev->dirty_lock);
>>>>> +       kfree(tdev->fbdev_cma);
>>>>> +       kfree(tdev);
>>>>> +}
>>>>> +EXPORT_SYMBOL(tinydrm_release);
>>>>> +
>>>>>    static int tinydrm_init(struct device *parent, struct tinydrm_device
>>>>> *tdev,
>>>>>                          const struct drm_framebuffer_funcs *fb_funcs,
>>>>>                          struct drm_driver *driver)
>>>>> @@ -160,8 +206,6 @@ static int tinydrm_init(struct device *parent,
>>>>> struct tinydrm_device *tdev,
>>>>>    static void tinydrm_fini(struct tinydrm_device *tdev)
>>>>>    {
>>>>> -       drm_mode_config_cleanup(&tdev->drm);
>>>>> -       mutex_destroy(&tdev->dirty_lock);
>>>>>          drm_dev_unref(&tdev->drm);
>>>>>    }
>>>>> @@ -178,8 +222,8 @@ static void devm_tinydrm_release(void *data)
>>>>>    * @driver: DRM driver
>>>>>    *
>>>>>    * This function initializes @tdev, the underlying DRM device and it's
> While at it, s/it's/its/
>
>>>>> - * mode_config. Resources will be automatically freed on driver detach
>>>>> (devres)
>>>>> - * using drm_mode_config_cleanup() and drm_dev_unref().
>>>>> + * mode_config. drm_dev_unref() is called on driver detach (devres) and
>>>>> when
>>>>> + * all refs are dropped, tinydrm_release() is called.
>>>>>    *
>>>>>    * Returns:
>>>>>    * Zero on success, negative error code on failure.
>>>>> @@ -226,14 +270,17 @@ static int tinydrm_register(struct tinydrm_device
>>>>> *tdev)
>>>>>
>>>>>    static void tinydrm_unregister(struct tinydrm_device *tdev)
>>>>>    {
>>>>> -       struct drm_fbdev_cma *fbdev_cma = tdev->fbdev_cma;
>>>>> -
>>>>>          drm_atomic_helper_shutdown(&tdev->drm);
>>>>>
>>>>> -       /* don't restore fbdev in lastclose, keep pipeline disabled */
>>>>> -       tdev->fbdev_cma = NULL;
>>>>> -       drm_dev_unregister(&tdev->drm);
>>>>> -       if (fbdev_cma)
>>>>> -               drm_fbdev_cma_fini(fbdev_cma);
>>>>> +
>>>>> +       /* Get a ref that will be put in tinydrm_fini() */
>>>>> +       drm_dev_ref(&tdev->drm);
>>>> Why do we need that private ref? Grabbing references in unregister code
>>>> looks like a recipe for leaks ...
>>> Yeah, it's better to take the second ref in devm_tinydrm_register().
>>>
>>> The reason I need 2 refs is because tinydrm is set up with 2 devm_
>>> functions. This way I can leave all error path cleanup in the hands of
>>> devres.
>>>
>>> static int driver_probe(...)
>>> {
>>>      ret = devm_tinydrm_init(...);
>>>      if (ret)
>>>          return ret;
>>>      
>>>      ret = tinydrm_display_pipe_init(...);
>>>      if (ret)
>>>          return ret;
>>>      
>>>      drm_mode_config_reset(...);
>>>      
>>>      return devm_tinydrm_register(...);
>>> }
>> I don't think this works. What's worse, devm has a high chance of
>> freeing stuff at the wrong time since it's tied to the
>> drm_device->dev, and not to drm_device.
>>
>> I'm not sure what a better solution is, but when discussing all this
>> with Laurent we agreed that in general, devm_ needs to be considered
>> harmful. Looks simple, very easy to create broken code that doesn't
>> clean up properly on unplug. As much as leaks are bad, freeing before
>> all users are gone is worse. devm_ "fixes" the former and heavily
>> encourages the latter.
> I totally agree. A crash at unbind time is worse than a memory leak given that
> the bind/unbind cycles are unfrequent. Of course we need to aim for fixing
> both problems :-)

I'm not sure I see how the use of devm_ causes any other problems than
the fact that the parent device has been deleted. In the case of usb,
after usb_driver.disconnect it's not allowed to access the usb_device.
This means that the driver has to protect against this. The same applies
to the devm_ resources, after the device is gone, they're not available
anymore. It ofc is easier to protect one resource than several, but in
principal it's the same, isn't it?

For me the beauty of devres is that it removes the need for error paths
that are almost never exercised and the probe code is simpler and easy
to read.

>> In your case I think devm_tinydrm_register is correct (but a bit
>> strange).
> Yes, I don't think an explicit tinydrm_unregister() call in the driver's
> .remove() handler would be so difficult.
>
>> devm_tinydrm_init otoh looks like all the code it has should be moved into
>> the drm_driver->release callback. You can't destroy the dirty_lock while a
>> thread might be holding it right at that moment.
> Agreed too.
>
> Furthermore the devm_kzalloc() in tinydrm_display_pipe_init() seems suspicious
> to me. It should be easy to replace it with kzalloc(), the mode can be freed
> in tinydrm_connector_destroy().

It's safe since it's protected by connector detect(), but I agree with you.

> (And on a side note the direct mode copy without calling drm_mode_duplicate()
> in that function is also suspicious, although it might not be an issue in
> practice)
>
>>>>> +
>>>>> +       drm_fbdev_cma_dev_unplug(tdev->fbdev_cma);
>>>>> +       drm_dev_unplug(&tdev->drm);
>>>>> +
>>>>> +       /* Make sure framebuffer flushing is done */
>>>>> +       mutex_lock(&tdev->dirty_lock);
>>>>> +       mutex_unlock(&tdev->dirty_lock);
>>>> Is this really needed? Or, doesn't it just paper over a driver bug you
>>>> have already anyway, since native kms userspace can directly call
>>>> fb->funcs->dirty too, and you already protect against that.
>>>>
>>>> This definitely looks like the fbdev helper is leaking implementation
>>>> details to callers where it shouldn't do that.
>>> Flushing can happen while drm_dev_unplug() is called, and when we leave
>>> this function the device facing resources controlled by devres will be
>>> removed. Thus I have to make sure any such flushing is done before
>>> leaving so the next flush is stopped by the drm_dev_is_unplugged() check.
>>> I don't see any other way of ensuring that.
>>>
>>> I see now that I should move the call to drm_atomic_helper_shutdown()
>>> after drm_dev_unplug() to properly protect the pipe .enable/.disable
>>> callbacks.
>> Hm, calling _shutdown when the hw is gone already won't end well.
>> Fundamentally this race exists for all use-cases, and I'm somewhat
>> leaning towards plugging it in the core.
>>
>> The general solution probably involves something that smells a lot
>> like srcu, i.e. at every possible entry point into a drm driver
>> (ioctl, fbdev, dma-buf sharing, everything really) we take that
>> super-cheap read-side look, and drop it when we leave.
> That's similar to what we plan to do in V4L2. The idea is to set a device
> removed flag at the beginning of the .remove() handler and wait for all
> pending operations to complete. The core will reject any new operation when
> the flag is set. To wait for completion, every entry point would increase a
> use count, and decrease it on exit. When the use count is decreased to 0
> waiters will be woken up. This should solve the unplug/user race.

Ah, such a simple solution, easy to understand and difficult to get wrong!
And it's even nestable, no danger of deadlocking.

Maybe I can use it with tinydrm:

* @dev_use: Tracks use of functions acessing the parent device.
*           If it is zero, the device is gone. See ...
struct tinydrm_device {
     atomic_t dev_use;
};

/**
  * tinydrm_dev_enter - Enter device accessing function
  * @tdev: tinydrm device
  *
  * This function protects against using a device and it's resources after
  * it's removed. Should be called at the beginning of the function.
  *
  * Returns:
  * False if the device is still present, true if it is gone.
  */
static inline bool tinydrm_dev_enter(struct tinydrm_device *tdev)
{
     return !atomic_inc_not_zero(&tdev->dev_use);
}

static inline void tinydrm_dev_exit(struct tinydrm_device *tdev)
{
     atomic_dec(&tdev->dev_use);
}

static inline bool tinydrm_is_unplugged(struct tinydrm_device *tdev)
{
     bool ret = !atomic_read(&tdev->dev_use);
     smp_rmb();
     return ret;
}


static int tinydrm_init(...)
{
     /* initialize */

     /* Set device is present */
     atomic_set(&tdev->dev_use, 1);
}

static void tinydrm_unregister(...)
{
     /* Set device gone */
     atomic_dec(&tdev->dev_use);

     /* Wait for all device facing functions to finish */
     while (!tinydrm_is_unplugged(tdev)) {
         cond_resched();
     }

     /* proceed with unregistering */
}

static int mipi_dbi_fb_dirty(...)
{
     if (tinydrm_dev_enter(tdev))
         return -ENODEV;

     /* flush framebuffer */

     tinydrm_dev_exit(tdev);
}


>> Then on unplug we call synchroize_srcu, which essentially does what
>> your lock grab&drop trick does. Just much less overhead on the read
>> side (we can sprinkle this over everything without worrying about
>> wasting cpu time), and a bit clearer in the intention on the _unplug
>> side.
> Looks like srcu already implements what we need. I'm not sure we need an RCU-
> based solution though, it might be too complex.
>
>> But yeah, another thing that'll be a pile of work to fix properly. I'd
>> leave it at a FIXME comment in drm_dev_unplug() until we get to it.
>> And for now not worry about all the possible races. One thing at a
>> time and all that.
>>
>>>>>    }
>>>>>    
>>>>>    static void devm_tinydrm_register_release(void *data)
>>>>> diff --git a/drivers/gpu/drm/tinydrm/mi0283qt.c
>>>>> b/drivers/gpu/drm/tinydrm/mi0283qt.c
>>>>> index 2465489..84ab8d1 100644
>>>>> --- a/drivers/gpu/drm/tinydrm/mi0283qt.c
>>>>> +++ b/drivers/gpu/drm/tinydrm/mi0283qt.c
>>>>> @@ -31,6 +31,9 @@ static void mi0283qt_enable(struct
>>>>> drm_simple_display_pipe *pipe,
>>>>>
>>>>>          DRM_DEBUG_KMS("\n");
>>>>>    
>>>>> +       if (drm_dev_is_unplugged(&tdev->drm))
>>>>> +               return;
>>>>> +
>>>>>          ret = regulator_enable(mipi->regulator);
>>>>>          if (ret) {
>>>>>                  dev_err(dev, "Failed to enable regulator (%d)\n", ret);
>>>>> @@ -133,6 +136,7 @@ static struct drm_driver mi0283qt_driver = {
>>>>>                                    DRIVER_ATOMIC,
>>>>>          .fops                   = &mi0283qt_fops,
>>>>>          TINYDRM_GEM_DRIVER_OPS,
>>>>> +       .release                = tinydrm_release,
> Should this be included in the TINYDRM_GEM_DRIVER_OPS macro ?

I did that first, then I saw the name GEM OPS, and put it outside :-)

Noralf.

>>>>>          .lastclose              = tinydrm_lastclose,
>>>>>          .debugfs_init           = mipi_dbi_debugfs_init,
>>>>>          .name                   = "mi0283qt",
> [snip]
>



More information about the dri-devel mailing list