[PATCH 3/3] drm: do not sleep on vblank while holding a mutex

Daniel Vetter daniel at ffwll.ch
Thu Oct 27 03:43:35 PDT 2011

On Wed, Oct 26, 2011 at 05:33:24PM -0500, Ilija Hadzic wrote:
> Is DRM support for other OS-es officially dead ? I know it's not in
> best shape on BSD and friends, but I am not sure if I should make it
> worse (or inconsistent with the current code shape and form).

I think the idea of sharing kernel drm code is pretty much dead, yeah.

> main()
> {
>     int fd;
>     drmVBlank vbl;
>     fd = open("/dev/dri/card0", 0);
>     printf("fd=%d\n", fd);
>     while(1) {
>         vbl.request.type =  DRM_VBLANK_RELATIVE ;
>         vbl.request.sequence = 60;
>         printf("waiting on vblank\n");
>         ioctl(fd, DRM_IOCTL_WAIT_VBLANK, &vbl);
>     }
> }
> Then start glxgears (on CRTC-0) and watch the hell break loose. The
> hostile code will cause any OpenGL application to hang for a full
> second before it can even enter the vblank_wait. Now because all
> vblank waits go through a common function in DDX, the whole
> windowing system will consequently go into a crawl. Run it with
> Compiz and things are even worse (the whole windowing system goes
> "stupid" as soon as the hostile application is started).

This is really just horrible.
> >>+	add_wait_queue(&(queue), &entry);			\
> >>+	mutex_unlock(&drm_global_mutex);			\
> >
> >Hiding a lock-dropping in this fashion is horrible.
> I explained above why am I have to release the lock inside the
> macro. That's equivalent to what you can find in the userland in
> POSIX threads, like I said. So that's not bad.

I disagree - this is just the blk magic reinvented. Also I don't like to
rush locking changes, because the hard part is reviewing all the drivers,
and I prefer to do that just once for the real solution.

Also explaining locking changes (especially in drm) starting with
"assuming the old code is correct, this is safe" isn't great. Because your
changes might simply enlarge the already existing race.

On a quick glance
- drm_vblank functions call by wait_vblank seem to do correct locking
  already using various spinlocks and atomic ops.
- linux waitqueues have their own locking
- dev->last_vblank_wait is only used for a procfs file. I don't think it
  makes much sense given that we can have more than one vblank waiter
- there's no selective wakeup going on, all waiters get woken for every
  vblank and call schedule again if it's not the right vblank

The only really hairy thing going on is racing modeset ioctls against
vblank waiters. But modeset is protected by the dev->mode_config.mutex
and hence already races against wait_vblank with the current code, so I'm
slightly inclined to ignore this for the moment. Would be great if you
coudl check that in-depth, too.

So I think the right thing to do is
- Kill dev->last_vblank_wait (in a prep patch).
- Imo we also have a few too many atomic ops going on, e.g.
  dev->vblank_refcount looks like it's atomic only because or the procfs
  file, so maybe kill that procfs file entirely.
- Audit the vblank locking, maybe resulting in a patch with updated
  comments, if there's anything misleading or tricky going on. And it's
  always good when the locking of structe members is explained in the
  header file ...
- Drop the mutex locking because it's not needed.

While locking at the code I also noticed that wait_vblank calls
drm_vblank_get, but not always drm_vblank_put. The code is correct, the
missing vblank_put is hidden in drm_queue_vblank_event. Can you also write
the patch to move that around to not trip up reviewers the next time

Yours, Daniel
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48

More information about the dri-devel mailing list