[RFC 0/9] gbm_surface

Tue Dec 20 18:52:15 PST 2011

On Wed, Dec 14, 2011 at 4:59 AM, Ander Conselvan de Oliveira
<conselvan2 at gmail.com> wrote:
> From: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira at intel.com>
>
> Hi,
>
> So, in London we discussed moving the drm compositor to a new
> gbm_surface interface, which would allow the buffer management
> to be handled by the egl implementation by calling eglSwapBuffers.
>
> Here's a prototype implementation of such an interface in mesa and a
> patch to wayland-demos moving compositor-drm to it. With the changes,
> the compositor creates a gbm_surface for each output and from that it
> creates egl surfaces.
>
> The compositor is still responsible for the actual flipping. In order
> to do that, the new interface includes a function for getting a bo for
> a gbm_surface's front buffer. This change does not impact the ability
> of scanning out a client buffer.

Hi Ander,

Thanks for getting a prototype implementation up so fast.  It looks
pretty good, but I think there are a few corner cases we need to think
about.  Apologies for the long, rambling email.  First of, I talked to
Robert Bragg in IRC a bit and he had a few concerns/ideas:

 - renaming gbm_surface_get_bo() to gbm_surface_lock_bo()?  I like it;
it's a little more specific and pairs better with
gbm_surface_release_bo().

1) Surface resize: We talked about adding a gbm_surface_resizee()
entrypoint that call after swapbuffer, but before any new rendering.
That will then make the surface destroy all buffers and allocate new
ones at the new size.  I don't think this really adds anything over
just destorying the surface and creating a new at the new size.  It's
not a performance critical operation (only happens when you change the
resolution) and the most expensive part is probably reallocating the
gbm bos, which we have to do anyway.

2) Freeing buffers when idle; we talked about a gbm_surface_idle()
function that would ask the gbm_surface to destroy back and auxiliary
buffers.  It's not useful if the system is just plain idle (if nothing
is happening, there's no reason to discard buffers), but if we end up
pageflipping to client provided buffers, we could free up some memory
by releasing the compositors buffers.  However, it's not clear that we
want to give up the compositors scanout buffers - it's a scarce
resource on some systems and what if we can't reallocate them?  And
again, I'm not sure gbm_surface_idle() really buys us anything over
just destroying the gbm_surface.

3) Interaction of gbm_surface_destroy() and locked buffers.  What
happens if you destroy a gbm_surface while you have one or more locked
bos (scanning out or waiting to be pageflipped).  I think we just need
to say that that's an error.  In other words, to destroy a surface,
first release all locked bo's and then destroy.  It's also an error to
gbm_bo_destroy() a locked bo and after gbm_dri_surface_release_bo()
the bo is undefined (could be destroyed etc).

My concerns are mainly around sequence and ordering of eglSwapBuffer,
gbm_surface_lock_buffer and gbm_surface_release_buffer:

4) What happens if we do two or more swapbuffers without locking the
resulting bo?  Obviously we can't lock more buffers than we've
"generated" with eglSwapBuffer, so that has to be an error.  But we
could eglSwapBuffer a few times before we do
gbm_surface_lock_buffer().  Is that useful and should that be legal?
Or should we require that eglSwapBuffer must always be followed by a
gbm_surface_lock_buffer() before you can do another eglSwapBuffer?

5) We can't block rendering if there are no buffers to render to - we
just don't have a mechanism to do that.  If we try to render and all
the buffers are locked, we can either allocate a new buffer, overwrite
a locked buffer, just discard the rendering, but it's probably
actually just an error case, really.  So to avoid the client hitting
this error case, we need a gbm_surface_can_render() type of function.
If we knew how many buffers the surface was using, we could keep track
ourselves, but the point is to avoid that.  So before rendering, the
client can check gbm_surface_can_render() (it really needs a better
name) and if that's true, it knows it can render a new frame.  And I
think we should require that even after releasing a bo.  You could
argue that the client knows there's a buffer available just after
releasing one back to the surface, but I think we need a little
flexibility there to avoid assumption about the implementation and the
number of buffers.

6) Ordering of client buffers and gbm_surface buffers.  If we require
that eglSwapBuffer must be followed by gbm_surface_lock_buffer()
before you can render again (and I think we should), this isn't an
issue.  So this is probably mostly hypothetical, but here it goes: if
we allowed the sequence eglSwapBuffer, eglSwapBuffer,
gbm_surface_lock_bo(), then there would still be a bo queued up in the
surface that we would get on the next lock bo call.  As we mix the gbm
surface with pageflipping to client buffers, we need to make sure that
the sequence of buffers that we flip to is what we want it to be.  If
we pageflip to a client buffer with a stale buffer sitting in the
gbm_surface, we'll get an out-of-order frame as we eventually go back
to pageflipping to the gbm surface buffers.  This is basically the
reason I think we should allow this and mandate that you must lock the
buffer you just rendered befure starting a new one.  An alternative
solution would be to have a way to push client buffers into the
gbm_surface, so that they appear in the same stream of buffers, in the
expected order, but that seems much more complicated than just
requiring that buffer can't queue up in the surface.

7) Finally, we need to change the rendering loop to actually make
triple buffering work the way we want.  Right now we always render in
response to the pageflip event.  Which means that when we can't render
a frame in less that the frame time, we end up dropping to half the
refresh rate.  Triple buffering can help us in two ways, depending on
the hardware: either it will just let us render as fast as possible
(ie, we get 50fps when the render time is 20ms), or if the gl stack
has a deep pipeline that lets us overlap rendering of two frames (eg,
a frame first takes 10ms of cpu processing, then 10ms on the gpu), in
which case triple buffering could let us hit 60fps (ie, it takes 20ms
to render one frame, but because of pipelining, we can produce one
every 10ms).  Either way, from the compositors point of view, the two
cases are the same: instead of rendering whenever the pageflip is
done, we need to render whenever we can (thus
gbm_surface_can_render()).  So essentially, when we're done with one
frame, we ask can_render() right away, and if true, start rendering
the next frame.If we can't render, we have to wait until a pageflip
finishes and we release a buffer, at which point we ask can_render()
again, and render the next frame if true.  For a double buffered gbm
surface, this is the same behavior as we have now.

However, if we unconditionally enable triple buffering, we'll render
an extra frame ahead for the fast case (ie when we could do 60fps
without triple buffering), so there's a latency penalty and we're
wasting memory on a third bo we don't need.  So we need to be able to
enable and disable triple buffering as needed: in some cases we'll
never need it, other cases we need it all the time, and some cases
we'll need to turn it on and off as the load changes.  We can either
have the compositor detect when it's dropping frames and provide an
explicit gbm API call to enable/disable triple buffering or perhaps to
increase and decrease the number of buffers in use.  Or we can provide
the gbm surface with the bo presentation time when we release it back
to the surface and have the surface figure out when we're dropping
frames and automatically manage the buffer count.  However for that to
work, the surface also needs to know the refresh rate and maybe more.

Alternatively (and this is starting to look more and more appealing)
we could just let the gbm surface manage the KMS page flipping.