[PATCH 6/6] drm/tinydrm: Support device unplug

Daniel Vetter daniel at ffwll.ch
Mon Sep 4 15:20:04 UTC 2017


On Mon, Sep 04, 2017 at 02:30:05PM +0200, Noralf Trønnes wrote:
> 
> Den 04.09.2017 11.04, skrev Daniel Vetter:
> > On Mon, Sep 04, 2017 at 11:41:05AM +0300, Laurent Pinchart wrote:
> > > Hi Daniel,
> > > 
> > > On Monday, 4 September 2017 10:26:15 EEST Daniel Vetter wrote:
> > > > On Thu, Aug 31, 2017 at 09:22:03PM +0200, Noralf Trønnes wrote:
> > > > > Den 31.08.2017 14.59, skrev Laurent Pinchart:
> > > > > > On Wednesday, 30 August 2017 20:18:49 EEST Daniel Vetter wrote:
> > > > > > > On Wed, Aug 30, 2017 at 6:31 PM, Noralf Trønnes wrote:
> > > > > > > > Den 28.08.2017 23.56, skrev Daniel Vetter:
> > > > > > > > > On Mon, Aug 28, 2017 at 07:17:48PM +0200, Noralf Trønnes wrote:
> > > > > > > > > > +
> > > > > > > > > > +       drm_fbdev_cma_dev_unplug(tdev->fbdev_cma);
> > > > > > > > > > +       drm_dev_unplug(&tdev->drm);
> > > > > > > > > > +
> > > > > > > > > > +       /* Make sure framebuffer flushing is done */
> > > > > > > > > > +       mutex_lock(&tdev->dirty_lock);
> > > > > > > > > > +       mutex_unlock(&tdev->dirty_lock);
> > > > > > > > > Is this really needed? Or, doesn't it just paper over a driver bug
> > > > > > > > > you have already anyway, since native kms userspace can directly
> > > > > > > > > call fb->funcs->dirty too, and you already protect against that.
> > > > > > > > > 
> > > > > > > > > This definitely looks like the fbdev helper is leaking
> > > > > > > > > implementation details to callers where it shouldn't do that.
> > > > > > > > Flushing can happen while drm_dev_unplug() is called, and when we
> > > > > > > > leave this function the device facing resources controlled by devres
> > > > > > > > will be removed. Thus I have to make sure any such flushing is done
> > > > > > > > before leaving so the next flush is stopped by the
> > > > > > > > drm_dev_is_unplugged() check. I don't see any other way of ensuring
> > > > > > > > that.
> > > > > > > > 
> > > > > > > > I see now that I should move the call to
> > > > > > > > drm_atomic_helper_shutdown() after drm_dev_unplug() to properly
> > > > > > > > protect the pipe .enable/.disable callbacks.
> > > > > > > Hm, calling _shutdown when the hw is gone already won't end well.
> > > > > > > Fundamentally this race exists for all use-cases, and I'm somewhat
> > > > > > > leaning towards plugging it in the core.
> > > > > > > 
> > > > > > > The general solution probably involves something that smells a lot
> > > > > > > like srcu, i.e. at every possible entry point into a drm driver
> > > > > > > (ioctl, fbdev, dma-buf sharing, everything really) we take that
> > > > > > > super-cheap read-side look, and drop it when we leave.
> > > > > > That's similar to what we plan to do in V4L2. The idea is to set a
> > > > > > device removed flag at the beginning of the .remove() handler and wait
> > > > > > for all pending operations to complete. The core will reject any new
> > > > > > operation when the flag is set. To wait for completion, every entry
> > > > > > point would increase a use count, and decrease it on exit. When the use
> > > > > > count is decreased to 0 waiters will be woken up. This should solve the
> > > > > > unplug/user race.
> > > > > Ah, such a simple solution, easy to understand and difficult to get wrong!
> > > > > And it's even nestable, no danger of deadlocking.
> > > > > 
> > > > > Maybe I can use it with tinydrm:
> > > > > 
> > > > > * @dev_use: Tracks use of functions acessing the parent device.
> > > > > *           If it is zero, the device is gone. See ...
> > > > > struct tinydrm_device {
> > > > >      atomic_t dev_use;
> > > > > };
> > > > > 
> > > > > /**
> > > > >   * tinydrm_dev_enter - Enter device accessing function
> > > > >   * @tdev: tinydrm device
> > > > >   *
> > > > >   * This function protects against using a device and it's resources after
> > > > >   * it's removed. Should be called at the beginning of the function.
> > > > >   *
> > > > >   * Returns:
> > > > >   * False if the device is still present, true if it is gone.
> > > > >   */
> > > > > static inline bool tinydrm_dev_enter(struct tinydrm_device *tdev)
> > > > > {
> > > > >      return !atomic_inc_not_zero(&tdev->dev_use);
> > > > > }
> > > > > 
> > > > > static inline void tinydrm_dev_exit(struct tinydrm_device *tdev)
> > > > > {
> > > > >      atomic_dec(&tdev->dev_use);
> > > > > }
> > > > > 
> > > > > static inline bool tinydrm_is_unplugged(struct tinydrm_device *tdev)
> > > > > {
> > > > >      bool ret = !atomic_read(&tdev->dev_use);
> > > > >      smp_rmb();
> > > > >      return ret;
> > > > > }
> > > > > 
> > > > > 
> > > > > static int tinydrm_init(...)
> > > > > {
> > > > >      /* initialize */
> > > > > 
> > > > >      /* Set device is present */
> > > > >      atomic_set(&tdev->dev_use, 1);
> > > > > }
> > > > > 
> > > > > static void tinydrm_unregister(...)
> > > > > {
> > > > >      /* Set device gone */
> > > > >      atomic_dec(&tdev->dev_use);
> > > > > 
> > > > >      /* Wait for all device facing functions to finish */
> > > > >      while (!tinydrm_is_unplugged(tdev)) {
> > > > >          cond_resched();
> > > > >      }
> > > > > 
> > > > >      /* proceed with unregistering */
> > > > > }
> > > > > 
> > > > > static int mipi_dbi_fb_dirty(...)
> > > > > {
> > > > >      if (tinydrm_dev_enter(tdev))
> > > > >          return -ENODEV;
> > > > > 
> > > > >      /* flush framebuffer */
> > > > > 
> > > > >      tinydrm_dev_exit(tdev);
> > > > > }
> > > > Yup, expect imo this should be done in drm core (as much as possible, i.e.
> > > > for ioctl at least, standard sysfs), plus then enter/exit exported to
> > > > drivers for their stuff.
> > > > 
> > > > And it really needs to be srcu. For kms drivers the atomic_inc/dec wont
> > > > matter, for render drivers it will show up (some of our ioctls are 100%
> > > > lock less in the fastpath, not a single atomic op in there).
> > > Don't forget that we need to check the disconnect flag when entering ioctls to
> > > return -ENODEV. I'm open to clever solutions, but I'd first aim for
> > > correctness with a lock and then replace the internal implementation.
> > As Noralf pointed out, we already check for drm_dev_is_unplugged(). Maybe
> > not in all places, but that can be fixed.
> > 
> > And you can't first make all ioctl slower and then fix it up, at least not
> > spread over multiple patch series. I guess for developing, doing the
> > simpler atomic counter first is ok. But the same patch series needs to
> > move over to srcu at the end.
> 
> I've never used srcu/rcu before, care to explain a little bit more?
> Is it drm_device we are protecting here?
> 
> How can srcu be better for the fast path you're talking about if it's
> half of srcu (assuming that makes srcu slower)?

We're just protecting drm_device->unplugged with srcu, but the great thing
is that srcu_read_lock/unlock are extremely cheap, and still guarantee
that we can fully synchronize with writers.

Here's the rough sketch:

/* srcu is real cheap on the read side, we can have one for all of drm
DEFINE_STATIC_SRCU(drm_unplug_srcu);

drm_dev_enter()
{
	srcu_read_lock(&drm_srcu);
}

drm_dev_exit()
{
	srcu_read_unlock(&drm_srcu);
}

drm_dev_is_unplugged()
{
	/* checking unplugged state outside of holding the srcu read side
	 * lock is racy, catch this. */
	WARN_ON(!srcu_read_lock_head(&drm_unplug_srcu));

	/* with rcu we can demote this to a simple read, no need for any
	 * atomic or memory barries
	return dev->unplugged;
}

drm_dev_unplug()
{
	dev->unplugged = true;

	/* This is were the real magic is. After it finished any critical
	 * read section is guaranteed to see the new value of ->unplugged,
	 * and any critical section which might still have seen the old
	 * value of ->unplugged is guaranteed to have finished.
	 * It also takes like forever to complete, but that's kinda the
	 * point. */
	synchronize_srcu(&drm_unplug_srcu);
}

Excercise for the reader is poking holes into the simpler atomic_t based
approach and highlighting all the races. You probably need to reach the
complexity of srcu until it's really race free (and fast).

Oh, one more: please make sure you enable CONFIG_PROVE_LOCKING and
especilly all the rcu options in there when working on this, to make sure
we catch all the deadlocks and bugs. This is some really tricky locking
stuff.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the dri-devel mailing list