[PATCH] DRI2: fixup handling of last_swap_target

Sun Mar 7 16:49:49 PST 2010

On Mon, 8 Mar 2010 01:03:53 +0100
Mario Kleiner <mario.kleiner at tuebingen.mpg.de> wrote:

> Great. I'll try to quickly go over it again when you're done.
> 
> I think i found another issue or two:
> 
> In DRI2SwapBuffers() you have this code for glXSwapBuffersMscOML  
> handling:
> 
> 	...
>      } else {
>          /* glXSwapBuffersMscOML could have a 0 target_msc, honor it */
>          *swap_target = target_msc;
>      }
> 
> Now while my previous patch was to restrictive and didn't allow to  
> honor a 0 target_msc, this one is too liberal. It would allow a  
> client to schedule many swaps for the same target_msc value, e.g.,  
> for target_msc = 1000:
> 
> while (1) glXSwapBuffersMscOML(dpy, drawable, 1000, 0, 0);

Hm, I thought I had added some logic before that too, maybe I just have
it locally.

> 
> -> This would cause the DRM at vblank 1000 to deliver many swap- 
> vblank events during the same vblank, which i assume can cause  
> trouble with timestamping of the swaps and probably server  
> responsiveness? In the pageflipped case, you'd schedule one pageflip  
> successfully and then if i understand correctly, on the 2nd call for  
> this vblank the pageflip ioctl would fail with something like EAGAIN  
> or EBUSY, which would cause the ddx to fall back to blits. If i  
> understand correctly from I830DRI2FrameEventHandler ->  
> I830DRI2CopyRegion, blit requests get submitted to the command fifo,  
> drmCommandNone(intel->drmSubFD, DRM_I915_GEM_THROTTLE); gets called,  
> then DRI2SwapComplete gets called with the 'msc' value of delivery of  
> the vblank event - in this case with 1000.
> 
> Not sure if the GEM_THROTTLE'ing somehow takes care of such issues -  
> or the fact that the gpu command fifo must be full at some time, but  
> is there something that would prevent many calls to DRI2SwapComplete  
> with exactly the same 'msc' / vblank value and 'ust' timestamp  
> despite the fact that most of the swaps won't happen at that msc but  
> later? Would any of the throttling here - or something like a full  
> fifo - cause the server to stall or get sluggish? I don't know enough  
> about the implementation, but allowing to deliver many identical  
> vblank swap events sounds problematic to me.

If the completions end up using blits, they'll just queue to the kernel
and may even stall the server until completion if the no-tearing option
is enabled.  Otherwise we may end up blocking in the kernel if the
command ring is full.

> 
> Maybe we could change above code to something like:
> 
> 	...
>      } else {
>          /* glXSwapBuffersMscOML could have a 0 target_msc, honor it */
> 	if (target_msc <= pPriv->last_swap_target && pPriv->swap_interval)
> 		target_msc = pPriv->last_swap_target + 1;
>          *swap_target = target_msc;
>      }
> 
> This would not constrain a given target_msc to honor the exact  
> swap_interval setting - what my previous -wrong- patch did. For a  
> target_msc of zero etc., i believe this updated target_msc would  
> result in the same behaviour as the current code. It wouldn't push  
> the target_msc into the "far future", but only make sure it doesn't  
> "get stuck" in the past or at a fixed value in the future, basically  
> only ensuring the "only at most one swap per msc increment"  
> requirement of the OML_sync_control spec.

Yeah, that sounds reasonable.  I ran into the 0 bug because a basic
swap is just that; but making sure we don't schedule a bunch for the
same MSC or old MSC is important too, so I'll fix the logic.

> A similar error case that could be kind'a solved by the above would  
> be wrong ordering of swaprequests:
> 
> Assume we're at msc 500, last_swap_target something smaller than  
> e.g., 1000:
> 
> glutSolidTeapot(....);
> glXSwapBuffersMscOML(dpy, drawable, 10000, 0, 0);
> glXSwapBuffersMscOML(dpy, drawable, 9000, 0, 0);
> glXSwapBuffersMscOML(dpy, drawable, 8000, 0, 0);
> glXSwapBuffersMscOML(dpy, drawable, 7000, 0, 0);
> 
> If you follow the current code it would cause roughly the following  
> sequence, due to scheduling vblank events for execution at 7000,  
> 8000, 9000, 10000:
> 
> First swap with teapot at 7000, not at 10000 as mandated by the spec  
> (which requires execution of requests in order of submission, ie.  
> fifo order).
> Successive swaps probably at 8000, 9000, 10000.
> 
> With above code it would be the more correct (less wrong?) order  
> 10000, 10001, 10002, 10003.

Heh, yeah that's a "which wrong do you want" choice. :)  Handling the
case where a swap is requested way in the future (or past, depending on
whether you're seeing wrapping) is a bit of a pain.  Where do you cut
things off?  If the app has been working well but then suddenly misses,
we can assume scheduling caused it to fail and just make a best
effort.  The kernel has some code for handling that already in fact; I
think vblank events will be delivered for counts in the past.

> Another potential related problem is that calls to DRI2ScheduleSwap()  
> while there are already swaps pending do not throttle the client. So  
> this...
> 
> while (1) {
> 	glutSolidTeapot();
> 	glXSwapBuffers();
> }
> 
> ...is fine, because the drawing in glutSolidTeapot() will require new  
> buffers (DRI2GetBuffers() etc.) and that routine will call  
> DRI2ThrottleClient() if swaps are pending.
> 
> This otoh...
> 
> while (1) glXSwapBuffers();
> 
> ... i think won't throttle - at least i didn't find code that would  
> do this. So you'd probably request large number of vblank events in  
> the DRM for swaps at successive vblank's. At some point i assume the  
> kernel would run out of memory for new vblank event structs or the  
> list of pending events inside the DRM would be full -> drmWaitVblank 
> () would fail with ENOMEM -> goto blitfallback; would trigger some  
> kind of behaviour and you'd have some weirdness in display - an  
> intermix of immediate swaps from the blitfallback and "zombie swaps"  
> as the vblank events get scheduled.

Ah yes, this probably is an issue; we should check the swap queue limit
in DRI2SwapBuffers as well, probably by just calling DRI2ThrottleClient
at that point.

> As weird as above example looks - swapping without drawing inbetween  
> - it is something that gets e.g., used for a quick & dirty  
> implementation of frame-sequential stereo (with 3d shutter glasses  
> etc.) on systems that don't support OpenGL quadbuffered stereo  
> contexts. So its not just something totally far fetched.

Yeah and a malicious app could do it too, so we should handle it.

> I thought quite a bit about such effects while looking at your code.  
> I think a really cool and more bullet-proof implementation of the  
> OML_sync_control spec that would allow a client to render multiple  
> frames ahead / have multiple swaps pending without getting into  
> trouble with a few race-conditions in the current code is possible,  
> but it would require a more dynamic scheduling of what happens when.  
> Maybe we can discuss such refinements when your first iteration of  
> the dri2 swap bits is finished and out there.

It seems like we're in pretty good shape in terms of not hanging the
server at least.  Not hanging the client when it passes in bad MSC
values is a good goal though too; I'd definitely be interested in
making that aspect more robust.

-- 
Jesse Barnes, Intel Open Source Technology Center