[PATCH] DRI2: fixup handling of last_swap_target

Sun Mar 7 22:47:38 PST 2010

On Mar 8, 2010, at 1:49 AM, Jesse Barnes wrote:
>>
>> In DRI2SwapBuffers() you have this code for glXSwapBuffersMscOML
>> handling:
>>
>> 	...
>>      } else {
>>          /* glXSwapBuffersMscOML could have a 0 target_msc, honor  
>> it */
>>          *swap_target = target_msc;
>>      }
>>
>> Now while my previous patch was to restrictive and didn't allow to
>> honor a 0 target_msc, this one is too liberal. It would allow a
>> client to schedule many swaps for the same target_msc value, e.g.,
>> for target_msc = 1000:
>>

Ok, forget about this. Leave it like it is, i.e. a simple assignment  
*swap_target = target_msc; no extra check. It can cause some problems  
with the semantic of the divisor and remainder stuff later on if we  
nudge the target_msc.

If we simply fix the 2nd issue by DRI2ThrottleClient() throttling the  
client at the beginning of DRI2SwapBuffers() as soon as a swap is  
already pending, then we get the correct fix for the above, and for  
ordering swaps correctly, for free :-)

>
> If the completions end up using blits, they'll just queue to the  
> kernel
> and may even stall the server until completion if the no-tearing  
> option
> is enabled.  Otherwise we may end up blocking in the kernel if the
> command ring is full.

Out of interest: The code in I830CopyRegion first submits the "wait  
for vblank" commands, followed by the copy region commands to the  
fifo, then calls intel_sync(). Does this intel_sync() block the  
server only until all batchbuffers are submitted to the gpu (and will  
execute at some time) or does it really block the server until they  
are completely processed - basically until the blit is really fully  
completed?

> effort.  The kernel has some code for handling that already in fact; I
> think vblank events will be delivered for counts in the past.

Yes, the kernel takes care. I think i understand the magic there now  
(my brain hurts - to many 32 bit wraparounds and signed math on  
unsigned ints). The kernel side of scheduling vblank events seems to  
be safe - even against vblank counter wraparounds  - as long as the  
requested count is not more than (2^32 - 2^23) approx. 4 billion  
counts ahead of the vblank count, or more than 2^23 = 8 million  
counts behind. This even works although the 64 bit values from the  
xserver get truncated to 32 bit when assigned to  
vbl.request.sequence, as far as i understand it.

The "behind" case can't happen due to the code in the Intel ddx. The  
"ahead" case one could guard against.

One remaining problem is that all returned vbl.reply.sequence values  
from the kernel are only 32 bit, but they're compared against the 64  
bit values from the client. Without a virtualized 64 bit vblank count  
in the ddx, things could get tricky if the target_msc values grow  
larger than 2^32. This can happen if a client passes in large numbers  
for good or bad reasons, or even during normal glXSwapBuffers calls  
if due to long system uptime one is close to such a 32 bit boundary.

-mario

>>
>> This otoh...
>>
>> while (1) glXSwapBuffers();
>>
>> ... i think won't throttle - at least i didn't find code that would
>> do this. So you'd probably request large number of vblank events in
>> the DRM for swaps at successive vblank's. At some point i assume the
>> kernel would run out of memory for new vblank event structs or the
>> list of pending events inside the DRM would be full -> drmWaitVblank
>> () would fail with ENOMEM -> goto blitfallback; would trigger some
>> kind of behaviour and you'd have some weirdness in display - an
>> intermix of immediate swaps from the blitfallback and "zombie swaps"
>> as the vblank events get scheduled.
>
> Ah yes, this probably is an issue; we should check the swap queue  
> limit
> in DRI2SwapBuffers as well, probably by just calling  
> DRI2ThrottleClient
> at that point.
>
>> As weird as above example looks - swapping without drawing inbetween
>> - it is something that gets e.g., used for a quick & dirty
>> implementation of frame-sequential stereo (with 3d shutter glasses
>> etc.) on systems that don't support OpenGL quadbuffered stereo
>> contexts. So its not just something totally far fetched.
>
> Yeah and a malicious app could do it too, so we should handle it.
>
>> I thought quite a bit about such effects while looking at your code.
>> I think a really cool and more bullet-proof implementation of the
>> OML_sync_control spec that would allow a client to render multiple
>> frames ahead / have multiple swaps pending without getting into
>> trouble with a few race-conditions in the current code is possible,
>> but it would require a more dynamic scheduling of what happens when.
>> Maybe we can discuss such refinements when your first iteration of
>> the dri2 swap bits is finished and out there.
>
> It seems like we're in pretty good shape in terms of not hanging the
> server at least.  Not hanging the client when it passes in bad MSC
> values is a good goal though too; I'd definitely be interested in
> making that aspect more robust.
>
> -- 
> Jesse Barnes, Intel Open Source Technology Center

*********************************************************************
Mario Kleiner
Max Planck Institute for Biological Cybernetics
Spemannstr. 38
72076 Tuebingen
Germany

e-mail: mario.kleiner at tuebingen.mpg.de
office: +49 (0)7071/601-1623
fax:    +49 (0)7071/601-616
www:    http://www.kyb.tuebingen.mpg.de/~kleinerm
*********************************************************************
"For a successful technology, reality must take precedence
over public relations, for Nature cannot be fooled."
(Richard Feynman)