[Mesa-dev] [PATCH] i965: Throttle rendering to an fbo

Wed Mar 4 10:28:16 PST 2015

On 03/04/2015 09:52 AM, Chris Wilson wrote:
> On Wed, Mar 04, 2015 at 09:41:56AM -0800, Chad Versace wrote:
>> On 02/26/2015 05:24 AM, Chris Wilson wrote:
>>> When rendering to an fbo, even though it may be acting as a winsys
>>> frontbuffer or just generally, we never throttle. However, when rendering
>>> to an fbo, there is no natural frame boundary. Conventionally we use
>>> SwapBuffers and glFinish, but potential callers avoid often glFinish for
>>> being too heavy handed (waiting on all outstanding rendering to complete).
>>> The kernel provides a soft-throttling option for this case that waits for
>>> rendering older than 20ms to be complete (that's a little too lax to be
>>> used for swapbuffers, but is here a useful safety net). The remaining
>>> choice is then either never to throttle, throttle after every draw call,
>>> or at an intermediate user defined point such as glFlush and thus all the
>>> implied flushes. This patch opts for the latter.
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
>>> Cc: Kenneth Graunke <kenneth at whitecape.org>
>>> Cc: Ben Widawsky <ben at bwidawsk.net>
>>> Cc: Kristian Høgsberg <krh at bitplanet.net>
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_context.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c
>>> index c844888..f190df1 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_context.c
>>> +++ b/src/mesa/drivers/dri/i965/brw_context.c
>>> @@ -229,11 +229,14 @@ static void
>>>  intel_glFlush(struct gl_context *ctx)
>>>  {
>>>     struct brw_context *brw = brw_context(ctx);
>>> +   __DRIscreen *psp = brw->intelScreen->driScrnPriv;
>>>  
>>>     intel_batchbuffer_flush(brw);
>>>     intel_flush_front(ctx);
>>>     if (brw_is_front_buffer_drawing(ctx->DrawBuffer))
>>>        brw->need_throttle = true;
>>> +
>>> +   drmCommandNone(psp->fd, DRM_I915_GEM_THROTTLE);
>>>  }
>>>  
>>>  static void
>>>
>>
>> glFlush should not wait for previous rendering to complete. It's not supposed
>> to be a blocking operation.
> 
> The manpage for glFlush says
> 
> "glFlush can return at any time.  It does not wait until the execution of *all*
> previously issued GL commands is complete."
> 
> Emphasis mine. In double buffered, and normal frontbuffered (non-fbo),
> rendering the throttle is a no-op as there will not be any old rendering
> to wait upon.

That text does not appear in the GL spec. When I read the manpage alongside
the GL spec, to get a more complete context, I think the manpage contains
that phrase simply to contrast with glFinish. In my reading, it does not imply that
glFlush may wait for *some* previously issued GL commands to complete.

As usual, the GL spec is too terse and too vague. So I quote Apple's GL documentation [1].
I believe it correctly explains the behavior of glFlush.

    Q:  What's the difference between glFlush() and glFinish()?

    A: [...] glFlush() causes all OpenGL commands currently queued to be submitted to
       the hardware for execution. This function returns immediately after having
       transferred the pending OpenGL command queue to the hardware (or software)
       renderer. These commands are queued for execution in some finite amount of time,
       but glFlush() does not block waiting for command completion.

[1] https://developer.apple.com/library/mac/qa/qa1158/_index.html

And I don't agree that the throttle is a no-op in double-buffered rendering. Consider
the following calls:

  0 // Setup the draw.
  1 glDraw();
  2 eglSwapBuffers(); --> internally calls glFlush
  3 // Setup the draw
  4 glDraw();
  5 eglSwapBuffers(); --> internally calls glFlush
  6 // Setup the draw
  7 glDraw();
  8 eglSwapBuffers(); --> internally calls glFlush

Before your patch, call 5 returns immediately, even if draw 1 has not
completed, allowing the app to proceed to the CPU actions in line 6.
If the app calls eglSwapBuffers too frequently, then call 8 will block
as needed (assuming EGL_SWAP_INTERVAL != 0 and double-buffering).

After your patch, call 5 may block, throttling on batches that may have been
submitted during the setup in lines 3 and 4. (The glDraw at 4 may submit batches for
resolve operations, for example). That prevents the app from proceeding
to whatever CPU actions are planned for line 6. Double-buffered eglSwapBuffers now
sometimes blocks, behaving like an almost-glFinish, even when the back buffer is
free for rendering.

>> Why this patch? What are you trying to fix?
> http://patchwork.freedesktop.org/patch/43432/

A valid bug. But I'm not convinced this Mesa patch is correct.