[PATCH v2] drm: do not sleep on vblank while holding a mutex
Ilija Hadzic
ihadzic at research.bell-labs.com
Fri Oct 28 19:44:54 PDT 2011
On Sat, 29 Oct 2011, Daniel Vetter wrote:
>> + /* the lock protects this section from itself executed in */
>> + /* the context of another PID, ensuring that the process that */
>> + /* wrote dev->last_vblank_wait is really the last process */
>> + /* that was here; processes waiting on different CRTCs */
>> + /* do not need to be interlocked, but rather than introducing */
>> + /* a new per-crtc lock, we reuse vbl_lock for simplicity */
>
> Multiline comments are done with one pair of /* */, see CodingStyle
>
Will fix the multiline comments (though scripts/checkpatch.pl didn't
complain so I figured it was OK)
>
> I personally vote for no additional locking at all here and just drop the
> global lock. Or maybe I'm blind and we need this. In that case, please
> clue me up.
>
What I am doing with the lock, is makeing sure that the last vblank wait
reported is really last. You said in your previous comment (which I agree
with) that the value of having last_vblank_wait in debugfs is in case of
hangs. In that case you want to see who really was the last one to issue
the vblank.
Now suppose that the drm_wait_vblank is enteret in the context of two
different PIDs and suppose that there are no locks. Let's say that the
first process wants to wait on vblank N and the second wants to wait on
vblank N+1. First process can reach the point just before it wants to
write to last_vblank_wait and lose the processor there (so it has N in
vblank.request (on its stack) calculated, but it has not written it into
last_vblank_wait yet (because it lost the CPU right there). Then the
second process gets the CPU, and executes and let's say that it goes all
the way through so it writes N+1 into last_vblank_wait and then schedules
away (it called DRM_WAIT_ON). Now the first process gets the CPU where it
left off and overwrites N+1 in last_vblank_wait with N, which is now
stale.
I explained in the comment (and in my note this morning), that the only
race condition is against itself in the context of different processes.
The above is the scenario.
Race against drm_vblank_info exists in theory, but in practice doesn't
matter because the reader of debugfs (or proc file system) runs
asynchronously (and typically at slower pace) from processes issuing
vblank waits (if it's a human looking at the filesystem then definitely
much slower pace). Since processes issuing vblanks are overwriting the
same last_vblank_value, drm_vblank_info is already doing a form of
downsampling and debugfs is losing information, anyway, so who cares about
the lock. In case of a hang, the value of last_vblank_wait doesn't change,
so again the lock in drm_vblank_info doesn't change anything. So adding a
lock there (which would have to be vbl_lock to make it usable) is only a
matter of theoretical correctness, but no practical value. So I decided
not to touch drm_vblank_info.
drm_wait_vblank, however, needs a protection from itself as I explained
above.
>> DRM_WAIT_ON(ret, dev->vbl_queue[crtc], 3 * DRM_HZ,
>> (((drm_vblank_count(dev, crtc) -
>> vblwait->request.sequence) <= (1 << 23)) ||
>
> One line below there's the curios case of us rechecking dev->irq_enabled.
> Without any locking in place. But afaics this is only changed by the
> driver at setup and teardown when we are guaranteed (hopefully) that
> there's no open fd and hence no way to call an ioctl. So checking once at
> the beginning should be good enough. Whatever.
It's probably redundant statement because it won't be reached if
irq_enabled is false and there is nothing to change it to true at
runtime, but that's a topic for a different patch.
-- Ilija
More information about the dri-devel
mailing list