[PATCH 2/2] DRI2: Add error message when working around driver bug

Wed Oct 27 08:57:36 PDT 2010

On Wed, 27 Oct 2010 17:15:28 +0200
Mario Kleiner <mario.kleiner at tuebingen.mpg.de> wrote:

> On Oct 26, 2010, at 7:08 PM, Jesse Barnes wrote:
> 
> > On Tue, 26 Oct 2010 19:19:11 +0300
> > Pauli Nieminen <ext-pauli.nieminen at nokia.com> wrote:
> >> SGI_video_sync:
> >> "The kernel maintains a video sync counter for
> >> each physical hardware pipe in a system; the counter is incremented
> >> upon the completion of the display of each full frame of video  
> >> data. An
> >> OpenGL context always corresponds to a pipe."
> >
> > Right, this is the rule we break; we don't have a 1:1 correspondence
> > between gfx pipes and display pipes.
> >
> >> "The single video sync counter is shared by all GLXContexts."
> >>
> >> Yes. You have to extent both OML_sync_control and SGI_video_sync  
> >> to expose
> >> separate MSC for different CRTCs.
> >>
> >> I can see race condition even with events.
> >>
> >> * Client checks for event (none yet in queue)
> >> * Client prepares to call some blocking msc call
> >> * Window manager decides to move the window
> >> * xserver send MSC change event
> >> * Client calls blocking MSC call
> >>
> >> But maybe there is solution. All blocking calls could fail if MSC  
> >> base
> >> changes. Client would have to query for new base and rate before  
> >> trying to
> >> issue same request again.
> >
> > Yeah, that might work; I agree there's a race even with events that
> > we'll need to handle.  But even with that race handled I'd like the
> > server to fail gracefully rather than hanging the app if an old MSC
> > value is passed in (though in that case we could print an error  
> > message
> > since we could assume an app error as long as the app was using the
> > extended version of the spec).
> >
> >> For swap interval it would be enough if DDX would notify DRI2  
> >> about crtc
> >> changes with offset (msc_for_new_crtc - msc_for_old_crtc) that can  
> >> be applied
> >> to swaptarget.
> >
> > Yeah, just jumping to the new value might be ok in general.  Hardcore
> > libraries like Mario's can do something fancier with the extensions
> > above to avoid unintended behavior.
> >
> 
> I agree. Pauli's offset fix would fix the common glXSwapBuffers()  
> case for now and make most apps happy. We should do that. My current  
> hack works fine (due to the underlying vsync'ing of the ddx's) as  
> long as an app uses glXSwapBuffers and has swap_interval left at its  
> default of 1. I'd expect most apps to do that, as it was the only  
> supported setting (except for 0) for a long time, and all other  
> operating systems i know of (Linux + proprietary drivers, all  
> Windows, all OS/X) only obey a swap_interval = 0 or 1, but this fix  
> would fix it for all swap_interval settings.
> 
> For the oml_sync_control case i see these options, and each of them  
> makes me dizzy and uneasy in a different way, probably because i  
> didn't think it through clearly:
> 
> a) As Michel Daenzer proposed earlier, give each drawable its own  
> virtualized msc. Initialize it at drawable creation time to the true  
> msc of the initial crtc. Then just use the current msc and info about  
> crtc transitions to update the virtualized msc. This way we'd be  
> probably closest to the spirit of the current spec, all msc's would  
> always increase monotonically instead of jumping back and forth and  
> crtc transitions would be handled properly without the app even  
> noticing or needing to care. If multiple drawable's are created on  
> the same crtc and stay there, they'll have the same msc's,  so  
> bufferswaps, waits and other events can be synchronized across  
> drawables and timestamps and counts compared meaningfully between  
> them. That would be nice to have.
> 
> Downside: As soon as a drawable moves away to another crtc with  
> different count, this beauty breaks down and the app would have large  
> problems finding out what just happened to it and how to relate the  
> msc's of different drawables to each other again. Possibly impossible  
> to recover with all those virtualized counts.
> 
> Also it's tricky to implement. We would need to translate forward and  
> backward between hardware msc's and virtualized msc each time we get  
> any event from the kernel or schedule one and keep track of which  
> crtc was assigned when all the time, also in all queued vblank events  
> and all returned vblank and swap events.
> 
> The fact that we currently have 64 bit msc counters in userspace and  
> spec, but only < 32 bit counters in kernel space and all the  
> wraparound issues doesn't make it simpler to get right and race-free.
> 
> b) Extend the spec:  If a crtc transition is detected, make all calls  
> that rely on the msc (glXSwapBuffersMsc(), glXWaitForMsc()) fail with  
> some error code until the app has called glXGetSyncValuesOML() again  
> to fetch the new, updated msc for the new crtc.
> 
> I like this because i think it is simpler to implement correctly and  
> because the apps still see what is really going on. Toolkits like  
> mine which require very tight (=paranoid) control about timing don't  
> like too much abstraction and virtualization.
> 
> Downside: Less elegant? What if multiple threads (somewhat likely)  
> will use blocking calls on the same drawable? One thread might query  
> the msc and clear the 'dirty bit', but other threads may not notice?  
> Need a per-thread dirty bit??
> 
> Or even have some dedicated function that the app needs to call to  
> acknowledge it realizes what just happened? Or a way for apps to tell  
> the server how they want to be treated in such cases?
> 
> E.g., one could have a transition counter for each drawable which  
> increments at each crtc transition and a way for the app to query the  
> current count and to pass the count with each blocking call,  
> basically as a cookie? The implementation could compare counts to  
> know if an app is aware at function call time that the crtc has  
> changed and fail a call if the counts mismatch.
> 
> c) Extend the spec with new possible error cases to watch out:  
> Unblock such calls in some slightly ugly manner to at least avoid an  
> application hang, e.g., just . Somehow notify the application of what  
> happened (error return code, or deriveable from the returned  
> timestamps/counts/intel_swap_events?), maybe print some warning to  
> the server log? How to decide which call after a transition should be  
> unblocked and which are valid calls again?
> 
> Ideas?

I like option (b) better.  Since our OML support is very recent, there
shouldn't be too many apps depending on the current behavior on Linux.
So rather than bending our implementation to match a spec that doesn't
really fit, I'd rather fix the spec to work better in a dynamic
environment, with whatever that entails on the implementation side.

Most of what you have in (b) is pretty straightfoward; even the shared
drawable case shouldn't be too bad, since each X connection could have
bits indicating whether the counter has been picked up after a CRTC
move.

I don't think it's entirely incompatible with the virtualized MSC idea
either; we may want to do that anyway to provide the full 64 bit space
to clients.  It's what we did in Mesa and it worked ok once we had all
the wrapping and CRTC movement bugs fixed. :)

-- 
Jesse Barnes, Intel Open Source Technology Center