[PATCH 3/3] drm: do not sleep on vblank while holding a mutex

Ilija Hadzic ihadzic at research.bell-labs.com
Thu Oct 27 20:19:39 PDT 2011

On Thu, 27 Oct 2011, Daniel Vetter wrote:

> On a quick glance
> - drm_vblank functions call by wait_vblank seem to do correct locking
>  already using various spinlocks and atomic ops.
> - linux waitqueues have their own locking
> - dev->last_vblank_wait is only used for a procfs file. I don't think it
>  makes much sense given that we can have more than one vblank waiter
> - there's no selective wakeup going on, all waiters get woken for every
>  vblank and call schedule again if it's not the right vblank

I concur.

> The only really hairy thing going on is racing modeset ioctls against
> vblank waiters. But modeset is protected by the dev->mode_config.mutex
> and hence already races against wait_vblank with the current code, so I'm
> slightly inclined to ignore this for the moment. Would be great if you
> coudl check that in-depth, too.

Will leave this for some other patch at some other time; the critical path 
is to fix the hang/crawl that I explained in my previous note.

> So I think the right thing to do is
> - Kill dev->last_vblank_wait (in a prep patch).

Agreed. Also drm_vblank_info function can go away

> - Imo we also have a few too many atomic ops going on, e.g.
>  dev->vblank_refcount looks like it's atomic only because or the procfs
>  file, so maybe kill that procfs file entirely.

I am not sure about that. dev->vblank_refcount looks critical to me: it is 
incremented through drm_vblank_get which is called from several different 
contexts: page-flip functions in several drivers (which can run in the 
context of multiple user processes), **_pm_** stuff in radeon driver 
(which looks like it runs in the context of kernel worker thread). Mutexes 
used at all these different places are not quite consistent. If anything, 
as the result of a later audit, some other mutexes may be rendered 
unecessary, but definitely, I would keep vblank_refcount atomic.

> - Audit the vblank locking, maybe resulting in a patch with updated
>  comments, if there's anything misleading or tricky going on. And it's
>  always good when the locking of structe members is explained in the
>  header file ...

I'll add comments that I feel make sense for this set of patches (if 
anything). Full audit, should come later at a slower pace. There are quite 
a few mutexes and locks many of which are overlapping in function and some 
are spurious. It will take more than just myself to untangle this know.

> - Drop the mutex locking because it's not needed.

Agreed. Will drop.

> While locking at the code I also noticed that wait_vblank calls
> drm_vblank_get, but not always drm_vblank_put. The code is correct, the
> missing vblank_put is hidden in drm_queue_vblank_event. Can you also write
> the patch to move that around to not trip up reviewers the next time
> around?

Actually, there is something fishy here. Look at the 'else' branch under 
the 'if ((seq - vblwait->request.sequence) <= (1 << 23))' at the end of 
the drm_queue_vblank_event. It doesn't have the 'drm_vblank_put'. It looks 
wrong to me, but I need to first convince myself that there is not some 
other obscure place where drm_vblank_put is accounted for. If that branch 
is really missing a drm_vblank_put, then it will be easy pull the 
drm_vblank_put out. Otherwise, it will be another knot to untangle.

More information about the dri-devel mailing list