Introduction and updates from NVIDIA

Thu Mar 24 23:52:31 UTC 2016

On Tue, 22 Mar 2016 17:12:52 -0700 Andy Ritger <aritger at nvidia.com> said:

> Maybe I'm not looking in the right place, but where does gbm_surface get
> the intended plane configuration?  Are there other display-related flags
> beside GBM_BO_USE_SCANOUT?  Then again, the particular plane doesn't
> impact us for current GPUs.

however you will not know the intended plane config because a compositor will
make this choice long after a buffer is allocated. it has received buffers from
clients and now has to choose how best to display this current screen setup
based on the input. it may use gpu to render, may assign buffers for scanout,
or anything else. the point is the layout may and often WILL change long after
the buffer has been allocated and even rendered to (or at least rendering has
started with fences able to ensure sync with an ongoing render).

so at best you can query current config - this is not totally correct and
streams don't solve this either. it's a fundamental issue that if you want real
optimal layout, you need an explicit protocol at a higher layer.

> Beyond choosing optimal rendering configuration, there is arbitration of
> the scarce resources needed for optimal rendering configuration.  E.g.,
> for Wayland compositor flipping to client-produced buffers, presumably the
> client's buffer needs to be allocated with GBM_BO_USE_SCANOUT.  NVIDIA's
> display hardware requires physically contiguous buffers, so we wouldn't
> want clients to _always_ allocate buffers with the GBM_BO_USE_SCANOUT
> flag.  It would be nice to have feedback between the EGL driver instance
> in the compositor and the EGL driver running in the client, to know how
> the buffer is going to be used by the Wayland compositor.
> 
> I imagine other hardware has even more severe constraints on displayable
> memory, though, so maybe I'm misunderstanding something about how buffers
> are shared between wayland clients and compositors?

same thing as above. you really cannot do this at the egl level because you
don't know that usage scenario beforehand. this really needs to be at a higher
level likely with an explicit wayland protocol and client-side co-operation.

for example. let's pretend that we have hardware with a fixed limited number of
hw planes. 1 is limited 256x256 argb (cursor), 1 is yuv only (can scale and
rotate 90 degrees), 2 are yuv or rgba (can scale and rotate), and 1 is rgba
only (can scale and rotate).

you have 5 applications drawing stuff. some apps display some video, some
not... do you really want all apps to split up their rendering into subsurfaces
AND thus scanout capable buffers? unlikely. you do not have enough planes to
support this. so the compositor likely wants to send "hints" to clients as to
how many buffers may be available for them and what capabilities they have. the
compositor may choose to hide the cursor layer because it's busy using it for
the cursor. :) clients can break up their display into lots of subsurfaces and
buffers - eg render browser content separately from chrome so it could
pan/scroll the content simply by offsetting a larger buffer and not
re-rendering. if one client becomes fullscreen/maximized, the compositor may
choose to tell all clients that they now can't display except this one, and
tell this one that it has 4 planes available, so the fullscreen client can
maximize efficiency, whilst the other hidden clients can stop using scanout
capable buffers (because they likely only will be displayed when task switching
as thumbnails etc. and thus only need memory the gpu can use as a texture).

but all of this would be much higher level that percolates up into the
toolkit/widget set and even client logic directly. it would require some time
for clients to adapt and re-render.

i just don't think you can make this all magically perfect at purely the egl or
kms or drm etc. layer. these layers are simple and explicit. compositor will do
a "best effort" given the buffer inputs it has. if you want this more optimal
you need to tell clients much more and then hope toolkits etc. respond.

> This ties into the next point...
> 
> The Vivante+Freescale example is a good one, but it would be more
> interesting if they shared /some/ formats and you could only use those
> common formats in /some/ cases.
> 
> I think a lot of the concern is about passing client-produced frames
> all the way through to scanout (i.e., zero-copy).  E.g., if the wayland
> client is producing frames that the wayland compositor is going to use
> as a texture, then we don't want the client to decompress as part of its
> eglSwapBuffers: the wayland compositor will texture from the compressed
> frame for best performance.  But, if the wayland compositor is going to
> flip to the surface, then we would want the client to decompress during
> its eglSwapBuffers.

correct, but as above... there is no way the client WILL know what WILL be done
because that decision is made much later. long after client has allocated and
rendered its frame. the compositor now reacts to this input and makes a
decision (and may change its decision frame by frame).

it's an inefficiency then to de-tile and re-tile (or compress then
decompress ... etc.). there really should be a compositor to client hinting
protocol that covers how many subsurfaces might be best, what formats might be
best etc. etc. - e.g. in this case if there are many surfaces on screen the
compositor might just tell all clients "please stick to 1 surface with argb, no
scanout" and at least until all clients re-draw and copy/convert their buffers
into non-scanout buffers there is a cost to display (de-tile/de-compress). too
bad. then once all clients have adapted, things work better.

> The nice thing about EGLStreams here is that if the consumer (the Wayland
> compositor) wants to use the content in a different way, the producer
> must be notified first, in order to produce something suitable for the
> new consumer.

that's the problem... the compositor (consumer) makes this decision LATER, not
BEFORE. :) things have to work, efficiently or not, regardless of the
compositor (consumer) decisions. adapting to become more efficient is far more
than a stream of 1 surface and a stream of buffers.

-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster at rasterman.com