[PATCH 3/3] drm: do not sleep on vblank while holding a mutex

Daniel Vetter daniel at ffwll.ch
Fri Oct 28 02:30:38 PDT 2011

On Thu, Oct 27, 2011 at 10:19:39PM -0500, Ilija Hadzic wrote:
> On Thu, 27 Oct 2011, Daniel Vetter wrote:
> >The only really hairy thing going on is racing modeset ioctls against
> >vblank waiters. But modeset is protected by the dev->mode_config.mutex
> >and hence already races against wait_vblank with the current code, so I'm
> >slightly inclined to ignore this for the moment. Would be great if you
> >coudl check that in-depth, too.
> >
> Will leave this for some other patch at some other time; the
> critical path is to fix the hang/crawl that I explained in my
> previous note.

Agreed, one thing at a time. It's complicated enough as is.
> >So I think the right thing to do is
> >- Kill dev->last_vblank_wait (in a prep patch).
> Agreed. Also drm_vblank_info function can go away
> >- Imo we also have a few too many atomic ops going on, e.g.
> > dev->vblank_refcount looks like it's atomic only because or the procfs
> > file, so maybe kill that procfs file entirely.
> I am not sure about that. dev->vblank_refcount looks critical to me:
> it is incremented through drm_vblank_get which is called from
> several different contexts: page-flip functions in several drivers
> (which can run in the context of multiple user processes), **_pm_**
> stuff in radeon driver (which looks like it runs in the context of
> kernel worker thread). Mutexes used at all these different places
> are not quite consistent. If anything, as the result of a later
> audit, some other mutexes may be rendered unecessary, but
> definitely, I would keep vblank_refcount atomic.

I've re-read the code a bit and in _get it's protected by dev->vbl_lock,
but not so in _put (because we issue a timer call to actually switch it
off). I think we could just shove it under the protection of dev->vbl_lock
completely. Another fuzzzy area is the relation between dev->vbl_lock and
dev->vblank_time_lock. The latter looks a bit rendundant.

> >- Audit the vblank locking, maybe resulting in a patch with updated
> > comments, if there's anything misleading or tricky going on. And it's
> > always good when the locking of structe members is explained in the
> > header file ...
> I'll add comments that I feel make sense for this set of patches (if
> anything). Full audit, should come later at a slower pace. There are
> quite a few mutexes and locks many of which are overlapping in
> function and some are spurious. It will take more than just myself
> to untangle this know.

Yeah, agreed. Let's first drop the mutex around this and untangle the
spinlock/atomic mess in a second step. This is too hairy.

> >- Drop the mutex locking because it's not needed.
> >
> Agreed. Will drop.
> >While locking at the code I also noticed that wait_vblank calls
> >drm_vblank_get, but not always drm_vblank_put. The code is correct, the
> >missing vblank_put is hidden in drm_queue_vblank_event. Can you also write
> >the patch to move that around to not trip up reviewers the next time
> >around?
> >
> Actually, there is something fishy here. Look at the 'else' branch
> under the 'if ((seq - vblwait->request.sequence) <= (1 << 23))' at
> the end of the drm_queue_vblank_event. It doesn't have the
> 'drm_vblank_put'. It looks wrong to me, but I need to first convince
> myself that there is not some other obscure place where
> drm_vblank_put is accounted for. If that branch is really missing a
> drm_vblank_put, then it will be easy pull the drm_vblank_put out.
> Otherwise, it will be another knot to untangle.

I've re-read the code and agree with your follow-up analysis.

Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48

More information about the dri-devel mailing list