[RFC] DRI2 synchronization and swap bits

Sun Nov 1 12:46:45 PST 2009

Hello everybody

My name is Mario Kleiner and i'm new to this list, so i apologize  
beforehand should i violate some rules of netiquette, state the  
totally obvious, or if this post is somehow considered off-topic or  
way too long. Please tell me if so, and how to do better next time.  
First some background to why i am posting, then some proposals more  
to the point of this RFC.

I read this RFC and i'm very excited about the prospect of having  
well working support for the OML_sync_control extension in DRI2 on  
Linux/X11. I was hoping for this to happen since years, so a big  
thank you in advance! This is why i hope to provide some input from  
the perspective of future "power-users" of functions like  
glXGetSyncValuesOML(), glXSwapBuffersMscOML(), glXWaitForSbcOML. I'm  
the co-developer of a popular free-software toolkit (Psychtoolbox)  
that is used mostly in the neuroscience / cognitive science community  
by scientist to find out how the different senses (visual, auditory,  
haptic, ...) work and how they work together. Our requirements to  
graphics are often much more demanding than what a videogame, typical  
vr-environment or a mediaplayer has.

Our users often have very strict requirements for scheduling frame- 
accurate and tear-free visual stimulus display, synchronizing  
bufferswaps across display-heads, and low-latency returns from swap- 
completion. Often they need swap-completion timestamps which are  
available with the shortest possible delay after a successfull swap  
and accurately tied to the vblank at which scanout of a swapped frame  
started. The need for timestamps with sub-millisecond accuracy is not  
uncommon. Therefore, well working OML_sync_control support would be  
basically a dream come true and a very compelling feature for Linux  
as a platform for cognitive science.

I spent the last 12 hours reading the CompositeSwap page at the DRI- 
Wiki and through Jesse Barnes git-tree and the drivers/gpu/drm/ 
drm_irq.c file in the linux-next git-tree at kernel org, which i  
assume (correctly?) is the current state of art wrt. to the DRM, and  
have some thoughts or wishes.

1. Wrt to "2) DRI2WaitMSC/SBC a) Concern about blocking the client on  
the server side as opposed to a client side wait."

I'm not sure about the extra latency involved by blocking the client  
on the server side, instead of a client side wait, but i can assure  
you that for our applications, 1 millisecond extra delay between swap- 
completion and unblocking can make a significant difference. Quite  
often certain actions need to be triggered in sync with swap  
completion. Examples are starting recording equipment for brain  
activity (fMRI, EEG, MEG, eye-trackers) or other physiological  
responses, starting sound playback or recording, sending trigger  
packets over a network, driving special digital/analog I/O boards,  
driving motion simulators etc. So low-latency unblocking would be  
much appreciated from our side.

2. On the CompositePage in the DRM Wiki, there is this comment:  
"...It seems that composited apps should never need to know about  
real world screen vblank issues, ... ....When dealing with a  
redirected window it seems it would be acceptable to come up with an  
entirely fake number for all existing extensions that care about  
vblanks.."

I don't like this idea about entirely fake numbers and like to vote  
for a solution that is as close as possible to the non-redirected  
case. Most of our applications run in non-redirected, full-screen,  
undecorated, page-flipped windows, ie., without a compositor being  
involved. I can think of a couple future usage cases though where  
reasonably well working redirected/composited windows would be very  
useful for us, but only if we get meaningful timestamps and vblank  
counts that are tied to the actual display onset.

3. The Wiki also mentions "The direct rendered cases outlined in the  
implementation notes above are complete, but there's a bug in the  
async glXSwapBuffers that sometimes causes clients to hang after  
swapping rather than continue." Looking through the code of <http:// 
cgit.freedesktop.org/~jbarnes/xf86-video-intel/tree/src/i830_dri.c? 
id=a0e2e624c47516273fa3d260b86d8c293e2519e4> i can see that in  
I830DRI2SetupSwap() and I830DRI2SetupWaitMSC(), in the "if (divisor  
== 0) { ...}" path, the functions return after DRM_VBLANK_EVENT  
submission without assigning *event_frame = vbl.reply.sequence;  This  
looks problematic to me, as the xserver is later submitting  
event_frame in the call to DRI2AddFrameEvent() inside DRI2SwapBuffers 
() as a cookie to find the right events for clients to wait on? Could  
this be a reason for clients hanging after swap? I found a few other  
spots where i other misunderstood something or there are small bugs.  
What is the appropriate way to report these?

4. According to spec, the different OML_sync_control functions do  
return a UST timestamp which is supposed to reflect the exact time of  
when the MSC last incremented, i.e., at the start of scanout of a new  
video frame. SBC and MSC are supposed to increment atomically/ 
simultaneously at swap completion, so the UST in the (UST,SBC,MSC)  
triplet is supposed to mark the time of transition of either MSC or  
MSC and SBC at swap completion. This makes a lot of sense to me, it  
is exactly the type of timestamp that our toolkit critically depends on.

Ideally the UST timestamp should be corrected to reflect start of  
scanout, but a UST that is consistently taken at vblank interrupt  
time would do as well. In the current implementation this is *not*  
the semantic we'd get for UST timestamps.

The I830DRI2GetMSC() call uses a call to drmWaitVBlank() and its  
returned vbl.reply.tval_sec and vbl.reply.tval_usec values for  
computing UST.
I830DRI2SetupSwap() and I830DRI2SetupWaitMSC() ask drmWaitVBlank() to  
drm_queue_vblank_event() vblank events. Later on, UST is computed  
from the timestamp contained in the dequeued events.

If you look at the drm_wait_vblank() and drm_queue_vblank_event()  
functions in the current dri_irq.c inside the linux-next tree, you'll  
expect the following undesireable behaviour:

I830DRI2GetMSC -> drmWaitVBlank -> drm_wait_vblank: Falls through  
DRM_WAIT_ON, because the wait condition is not satisifed and calls  
do_gettimeofday(&now) for the UST timestamp. This timestamping is not  
synchronized to the vblank at all!

I830DRI2SetupSwap() or I830DRI2SetupWaitMSC() -> drmWaitVBlank ->  
drm_wait_vblank -> drm_queue_vblank_event for a certain vblwait- 
 >request.sequence number. If this target sequence number has not yet  
been reached, the event gets queued and later on timestamped via  
do_gettimeofday() in drm_handle_vblank_events(), which is called from  
the vblank irq handler --> Exactly the behaviour we want! If however  
the vblwait->request.sequence number has been reached already in  
drm_queue_vblank_event() then the routine will retire the event  
immediately and apply a do_gettimeofday() timestamp immediately,  
which will result in a wrong UST timestamp.

Unreliable UST timestamps would make the whole OML_sync_control  
extension almost useless for us and probably other applications that  
require good sync e.g, btw. video and audio streams, so i'd ask you  
politely for improvements here.

I guess one (simple from the viewpoint of  a non-kernel hacker?) way  
would be to always timestamp the vblank in the drm_handle_vblank()  
routine, immediately after incrementing the vblank_count, probably  
protecting both the timestamp acquisition and vblank increment by one  
spinlock, so both get updated atomically? Then one could maybe  
extend  drm_vblank_count() to readout and return vblank count and  
corresponding timestamp simultaneously under protection of the lock?  
Or any other way to provide the timestamp together with the vblank  
count in an atomic fashion to the calling code in  
drm_queue_vblank_event(), drm_queue_vblank_event() and  
drm_handle_vblank_events()?

If you read up to here, thanks a lot for your attention and apologies  
for the long post.
-mario

*********************************************************************
Mario Kleiner
Max Planck Institute for Biological Cybernetics
Spemannstr. 38
72076 Tuebingen
Germany

e-mail: mario.kleiner at tuebingen.mpg.de
office: +49 (0)7071/601-1623
fax:    +49 (0)7071/601-616
www:    http://www.kyb.tuebingen.mpg.de/~kleinerm
*********************************************************************
"For a successful technology, reality must take precedence
over public relations, for Nature cannot be fooled."
(Richard Feynman)