[PATCH 2/2] DRI2: Add error message when working around driver bug

Wed Oct 27 08:15:28 PDT 2010

On Oct 26, 2010, at 7:08 PM, Jesse Barnes wrote:

> On Tue, 26 Oct 2010 19:19:11 +0300
> Pauli Nieminen <ext-pauli.nieminen at nokia.com> wrote:
>> SGI_video_sync:
>> "The kernel maintains a video sync counter for
>> each physical hardware pipe in a system; the counter is incremented
>> upon the completion of the display of each full frame of video  
>> data. An
>> OpenGL context always corresponds to a pipe."
>
> Right, this is the rule we break; we don't have a 1:1 correspondence
> between gfx pipes and display pipes.
>
>> "The single video sync counter is shared by all GLXContexts."
>>
>> Yes. You have to extent both OML_sync_control and SGI_video_sync  
>> to expose
>> separate MSC for different CRTCs.
>>
>> I can see race condition even with events.
>>
>> * Client checks for event (none yet in queue)
>> * Client prepares to call some blocking msc call
>> * Window manager decides to move the window
>> * xserver send MSC change event
>> * Client calls blocking MSC call
>>
>> But maybe there is solution. All blocking calls could fail if MSC  
>> base
>> changes. Client would have to query for new base and rate before  
>> trying to
>> issue same request again.
>
> Yeah, that might work; I agree there's a race even with events that
> we'll need to handle.  But even with that race handled I'd like the
> server to fail gracefully rather than hanging the app if an old MSC
> value is passed in (though in that case we could print an error  
> message
> since we could assume an app error as long as the app was using the
> extended version of the spec).
>
>> For swap interval it would be enough if DDX would notify DRI2  
>> about crtc
>> changes with offset (msc_for_new_crtc - msc_for_old_crtc) that can  
>> be applied
>> to swaptarget.
>
> Yeah, just jumping to the new value might be ok in general.  Hardcore
> libraries like Mario's can do something fancier with the extensions
> above to avoid unintended behavior.
>

I agree. Pauli's offset fix would fix the common glXSwapBuffers()  
case for now and make most apps happy. We should do that. My current  
hack works fine (due to the underlying vsync'ing of the ddx's) as  
long as an app uses glXSwapBuffers and has swap_interval left at its  
default of 1. I'd expect most apps to do that, as it was the only  
supported setting (except for 0) for a long time, and all other  
operating systems i know of (Linux + proprietary drivers, all  
Windows, all OS/X) only obey a swap_interval = 0 or 1, but this fix  
would fix it for all swap_interval settings.

For the oml_sync_control case i see these options, and each of them  
makes me dizzy and uneasy in a different way, probably because i  
didn't think it through clearly:

a) As Michel Daenzer proposed earlier, give each drawable its own  
virtualized msc. Initialize it at drawable creation time to the true  
msc of the initial crtc. Then just use the current msc and info about  
crtc transitions to update the virtualized msc. This way we'd be  
probably closest to the spirit of the current spec, all msc's would  
always increase monotonically instead of jumping back and forth and  
crtc transitions would be handled properly without the app even  
noticing or needing to care. If multiple drawable's are created on  
the same crtc and stay there, they'll have the same msc's,  so  
bufferswaps, waits and other events can be synchronized across  
drawables and timestamps and counts compared meaningfully between  
them. That would be nice to have.

Downside: As soon as a drawable moves away to another crtc with  
different count, this beauty breaks down and the app would have large  
problems finding out what just happened to it and how to relate the  
msc's of different drawables to each other again. Possibly impossible  
to recover with all those virtualized counts.

Also it's tricky to implement. We would need to translate forward and  
backward between hardware msc's and virtualized msc each time we get  
any event from the kernel or schedule one and keep track of which  
crtc was assigned when all the time, also in all queued vblank events  
and all returned vblank and swap events.

The fact that we currently have 64 bit msc counters in userspace and  
spec, but only < 32 bit counters in kernel space and all the  
wraparound issues doesn't make it simpler to get right and race-free.

b) Extend the spec:  If a crtc transition is detected, make all calls  
that rely on the msc (glXSwapBuffersMsc(), glXWaitForMsc()) fail with  
some error code until the app has called glXGetSyncValuesOML() again  
to fetch the new, updated msc for the new crtc.

I like this because i think it is simpler to implement correctly and  
because the apps still see what is really going on. Toolkits like  
mine which require very tight (=paranoid) control about timing don't  
like too much abstraction and virtualization.

Downside: Less elegant? What if multiple threads (somewhat likely)  
will use blocking calls on the same drawable? One thread might query  
the msc and clear the 'dirty bit', but other threads may not notice?  
Need a per-thread dirty bit??

Or even have some dedicated function that the app needs to call to  
acknowledge it realizes what just happened? Or a way for apps to tell  
the server how they want to be treated in such cases?

E.g., one could have a transition counter for each drawable which  
increments at each crtc transition and a way for the app to query the  
current count and to pass the count with each blocking call,  
basically as a cookie? The implementation could compare counts to  
know if an app is aware at function call time that the crtc has  
changed and fail a call if the counts mismatch.

c) Extend the spec with new possible error cases to watch out:  
Unblock such calls in some slightly ugly manner to at least avoid an  
application hang, e.g., just . Somehow notify the application of what  
happened (error return code, or deriveable from the returned  
timestamps/counts/intel_swap_events?), maybe print some warning to  
the server log? How to decide which call after a transition should be  
unblocked and which are valid calls again?

Ideas?

> Thanks again for all your work on this.  These are good improvements.
>

Indeed!
-mario

*********************************************************************
Mario Kleiner
Max Planck Institute for Biological Cybernetics
Spemannstr. 38
72076 Tuebingen
Germany

e-mail: mario.kleiner at tuebingen.mpg.de
office: +49 (0)7071/601-1623
fax:    +49 (0)7071/601-616
www:    http://www.kyb.tuebingen.mpg.de/~kleinerm
*********************************************************************
"For a successful technology, reality must take precedence
over public relations, for Nature cannot be fooled."
(Richard Feynman)