[PATCH 3/3] drm: do not sleep on vblank while holding a mutex

Wed Oct 26 15:33:24 PDT 2011

On Wed, 26 Oct 2011, Daniel Vetter wrote:

>
> While you mess around with this code, please use the standard linux
> wait_event_* infrastructure.

Right now the order of entering the queue is guaranteed to be the same as 
the order of entering the ioctl, which reflects the order of progressing 
vblank sequence. I wanted to preserve this semantic, so I need a wait 
function that puts the task into the queue and then unlocks the mutex 
(essentially a kernel equivalent of pthread_cond_wait that POSIX threads 
library implements).

The closest "approximation" of that wait_event_interruptable_locked, but 
that requires a spinlock, not mutex (and thus the rework to 
drm_wait_vblank ioctl would be more radical.

> I want to stop that DRM_FOO yelling from
> spreading around - the idea of the drm core being os agnostic died quite a
> while ago.
>

Is DRM support for other OS-es officially dead ? I know it's not in best 
shape on BSD and friends, but I am not sure if I should make it worse (or 
inconsistent with the current code shape and form).

Anyway, the primary reason for implementing the DRM_WAIT_ON_LOCKED
is because I didn't have the function that enqueues the task and then 
releases the mutex.

> Again, have you carefully checked whether this is safe and how the
> relevant data structures (vblank counting) are proctected?
>

I am only moving the global lock further down in the function. The 
structures moved out of the critical section are only local vars. and 
arguments. So it's safe. Had I changed the mutex to something other than 
global one (which is the right thing to do in a long run) then a more 
careful review would be warranted.

Also note that running drm_wait_ioctl as DRM_LOCKED is a severe problem 
that we have to address quickly. So I'd like to get *some* kind of decent 
fix in this merge window and then follow up with more polishing later.

For example, try compiling and running this code (which any bozo in the 
userland can do):

#include <stdio.h>
#include <xf86drm.h>

main()
{
     int fd;
     drmVBlank vbl;

     fd = open("/dev/dri/card0", 0);
     printf("fd=%d\n", fd);

     while(1) {
         vbl.request.type =  DRM_VBLANK_RELATIVE ;
         vbl.request.sequence = 60;
         printf("waiting on vblank\n");
         ioctl(fd, DRM_IOCTL_WAIT_VBLANK, &vbl);
     }
}

Then start glxgears (on CRTC-0) and watch the hell break loose. The 
hostile code will cause any OpenGL application to hang for a full second 
before it can even enter the vblank_wait. Now because all vblank waits go 
through a common function in DDX, the whole windowing system will 
consequently go into a crawl. Run it with Compiz and things are even worse 
(the whole windowing system goes "stupid" as soon as the hostile 
application is started).

>> +	add_wait_queue(&(queue), &entry);			\
>> +	mutex_unlock(&drm_global_mutex);			\
>
> Hiding a lock-dropping in this fashion is horrible.
>

I explained above why am I have to release the lock inside the macro. 
That's equivalent to what you can find in the userland in POSIX threads, 
like I said. So that's not bad.

What *is* bad is that I should have passed the mutex as an argument to 
DRM_WAIT_ON_LOCKED and that I'll fix.

-- Ilija