[PATCH v2] drm: do not sleep on vblank while holding a mutex

Fri Oct 28 19:44:54 PDT 2011

On Sat, 29 Oct 2011, Daniel Vetter wrote:
>> +	/* the lock protects this section from itself executed in */
>> +	/* the context of another PID, ensuring that the process that */
>> +	/* wrote dev->last_vblank_wait is really the last process */
>> +	/* that was here; processes waiting on different CRTCs */
>> +	/* do not need to be interlocked, but rather than introducing */
>> +	/* a new per-crtc lock, we reuse vbl_lock for simplicity */
>
> Multiline comments are done with one pair of /* */, see CodingStyle
>

Will fix the multiline comments (though scripts/checkpatch.pl didn't 
complain so I figured it was OK)

>
> I personally vote for no additional locking at all here and just drop the
> global lock. Or maybe I'm blind and we need this. In that case, please
> clue me up.
>

What I am doing with the lock, is makeing sure that the last vblank wait 
reported is really last. You said in your previous comment (which I agree 
with) that the value of having last_vblank_wait in debugfs is in case of 
hangs. In that case you want to see who really was the last one to issue 
the vblank.

Now suppose that the drm_wait_vblank is enteret in the context of two 
different PIDs and suppose that there are no locks. Let's say that the 
first process wants to wait on vblank N and the second wants to wait on 
vblank N+1. First process can reach the point just before it wants to 
write to last_vblank_wait and lose the processor there (so it has N in 
vblank.request (on its stack) calculated, but it has not written it into 
last_vblank_wait yet (because it lost the CPU right there). Then the 
second process gets the CPU, and executes and let's say that it goes all 
the way through so it writes N+1 into last_vblank_wait and then schedules 
away (it called DRM_WAIT_ON). Now the first process gets the CPU where it 
left off and overwrites N+1 in last_vblank_wait with N, which is now 
stale.

I explained in the comment (and in my note this morning), that the only 
race condition is against itself in the context of different processes.
The above is the scenario.

Race against drm_vblank_info exists in theory, but in practice doesn't 
matter because the reader of debugfs (or proc file system) runs 
asynchronously (and typically at slower pace) from processes issuing 
vblank waits (if it's a human looking at the filesystem then definitely 
much slower pace). Since processes issuing vblanks are overwriting the 
same last_vblank_value, drm_vblank_info is already doing a form of 
downsampling and debugfs is losing information, anyway, so who cares about 
the lock. In case of a hang, the value of last_vblank_wait doesn't change,
so again the lock in drm_vblank_info doesn't change anything. So adding a 
lock there (which would have to be vbl_lock to make it usable) is only a 
matter of theoretical correctness, but no practical value. So I decided 
not to touch drm_vblank_info.

drm_wait_vblank, however, needs a protection from itself as I explained 
above.

>>  	DRM_WAIT_ON(ret, dev->vbl_queue[crtc], 3 * DRM_HZ,
>>  		    (((drm_vblank_count(dev, crtc) -
>>  		       vblwait->request.sequence) <= (1 << 23)) ||
>
> One line below there's the curios case of us rechecking dev->irq_enabled.
> Without any locking in place. But afaics this is only changed by the
> driver at setup and teardown when we are guaranteed (hopefully) that
> there's no open fd and hence no way to call an ioctl. So checking once at
> the beginning should be good enough. Whatever.

It's probably redundant statement because it won't be reached if 
irq_enabled is false and there is nothing to change it to true at 
runtime, but that's a topic for a different patch.

-- Ilija