Introduction and updates from NVIDIA

Sat Apr 2 00:21:23 UTC 2016

On Fri, Mar 25, 2016 at 08:52:31AM +0900, Carsten Haitzler wrote:
> On Tue, 22 Mar 2016 17:12:52 -0700 Andy Ritger <aritger at nvidia.com> said:
> 
> > Maybe I'm not looking in the right place, but where does gbm_surface get
> > the intended plane configuration?  Are there other display-related flags
> > beside GBM_BO_USE_SCANOUT?  Then again, the particular plane doesn't
> > impact us for current GPUs.
> 
> however you will not know the intended plane config because a compositor will
> make this choice long after a buffer is allocated. it has received buffers from
> clients and now has to choose how best to display this current screen setup
> based on the input. it may use gpu to render, may assign buffers for scanout,
> or anything else. the point is the layout may and often WILL change long after
> the buffer has been allocated and even rendered to (or at least rendering has
> started with fences able to ensure sync with an ongoing render).
> 
> so at best you can query current config - this is not totally correct and
> streams don't solve this either. it's a fundamental issue that if you want real
> optimal layout, you need an explicit protocol at a higher layer.

Thanks.  Sorry, I think I led the discussion incorrectly with the talk
of plane configuration.  I was only asking a clarifying question to
Daniel's speculation of what we were concerned about in gbm.

As-is, yes, plane configuration is the domain of the Wayland compositor,
and the EGLStreams proposal doesn't alter that.  The point of the
EGLStreams proposal is to make sure that the driver performing the
hw-specific details of the buffer allocation has a complete picture of
how the buffer will be used.  Of course, your point that the usage could
change dynamically is good.

> > Beyond choosing optimal rendering configuration, there is arbitration of
> > the scarce resources needed for optimal rendering configuration.  E.g.,
> > for Wayland compositor flipping to client-produced buffers, presumably the
> > client's buffer needs to be allocated with GBM_BO_USE_SCANOUT.  NVIDIA's
> > display hardware requires physically contiguous buffers, so we wouldn't
> > want clients to _always_ allocate buffers with the GBM_BO_USE_SCANOUT
> > flag.  It would be nice to have feedback between the EGL driver instance
> > in the compositor and the EGL driver running in the client, to know how
> > the buffer is going to be used by the Wayland compositor.
> > 
> > I imagine other hardware has even more severe constraints on displayable
> > memory, though, so maybe I'm misunderstanding something about how buffers
> > are shared between wayland clients and compositors?
> 
> same thing as above. you really cannot do this at the egl level because you
> don't know that usage scenario beforehand. this really needs to be at a higher
> level likely with an explicit wayland protocol and client-side co-operation.
> 
> for example. let's pretend that we have hardware with a fixed limited number of
> hw planes. 1 is limited 256x256 argb (cursor), 1 is yuv only (can scale and
> rotate 90 degrees), 2 are yuv or rgba (can scale and rotate), and 1 is rgba
> only (can scale and rotate).
> 
> you have 5 applications drawing stuff. some apps display some video, some
> not... do you really want all apps to split up their rendering into subsurfaces
> AND thus scanout capable buffers? unlikely. you do not have enough planes to
> support this. so the compositor likely wants to send "hints" to clients as to
> how many buffers may be available for them and what capabilities they have. the
> compositor may choose to hide the cursor layer because it's busy using it for
> the cursor. :) clients can break up their display into lots of subsurfaces and
> buffers - eg render browser content separately from chrome so it could
> pan/scroll the content simply by offsetting a larger buffer and not
> re-rendering. if one client becomes fullscreen/maximized, the compositor may
> choose to tell all clients that they now can't display except this one, and
> tell this one that it has 4 planes available, so the fullscreen client can
> maximize efficiency, whilst the other hidden clients can stop using scanout
> capable buffers (because they likely only will be displayed when task switching
> as thumbnails etc. and thus only need memory the gpu can use as a texture).
> 
> but all of this would be much higher level that percolates up into the
> toolkit/widget set and even client logic directly. it would require some time
> for clients to adapt and re-render.
> 
> i just don't think you can make this all magically perfect at purely the egl or
> kms or drm etc. layer. these layers are simple and explicit. compositor will do
> a "best effort" given the buffer inputs it has. if you want this more optimal
> you need to tell clients much more and then hope toolkits etc. respond.

I'm all for pushing decision making higher in the software stack,
in general.  But for the sorts of things you describe above, it seems
like a lot of complexity to impose on clients.  For fully optimizing
the plane usage, I wonder if a HWC-like solution is a better way to go.

But in any case, I didn't mean to get into plane usage decisions.
The EGLStreams proposal is meant to keep plane usage decisions where
they currently are in compositors.

> > This ties into the next point...
> > 
> > The Vivante+Freescale example is a good one, but it would be more
> > interesting if they shared /some/ formats and you could only use those
> > common formats in /some/ cases.
> > 
> > I think a lot of the concern is about passing client-produced frames
> > all the way through to scanout (i.e., zero-copy).  E.g., if the wayland
> > client is producing frames that the wayland compositor is going to use
> > as a texture, then we don't want the client to decompress as part of its
> > eglSwapBuffers: the wayland compositor will texture from the compressed
> > frame for best performance.  But, if the wayland compositor is going to
> > flip to the surface, then we would want the client to decompress during
> > its eglSwapBuffers.
> 
> correct, but as above... there is no way the client WILL know what WILL be done
> because that decision is made much later. long after client has allocated and
> rendered its frame. the compositor now reacts to this input and makes a
> decision (and may change its decision frame by frame).

Agreed that the usage can change dynamically.  And we should make sure
that things don't fall off a cliff when the usage changes.  But, I think
the important performance case is the steady state.

Thanks,
- Andy

> it's an inefficiency then to de-tile and re-tile (or compress then
> decompress ... etc.). there really should be a compositor to client hinting
> protocol that covers how many subsurfaces might be best, what formats might be
> best etc. etc. - e.g. in this case if there are many surfaces on screen the
> compositor might just tell all clients "please stick to 1 surface with argb, no
> scanout" and at least until all clients re-draw and copy/convert their buffers
> into non-scanout buffers there is a cost to display (de-tile/de-compress). too
> bad. then once all clients have adapted, things work better.
> 
> > The nice thing about EGLStreams here is that if the consumer (the Wayland
> > compositor) wants to use the content in a different way, the producer
> > must be notified first, in order to produce something suitable for the
> > new consumer.
> 
> that's the problem... the compositor (consumer) makes this decision LATER, not
> BEFORE. :) things have to work, efficiently or not, regardless of the
> compositor (consumer) decisions. adapting to become more efficient is far more
> than a stream of 1 surface and a stream of buffers.
> 
> -- 
> ------------- Codito, ergo sum - "I code, therefore I am" --------------
> The Rasterman (Carsten Haitzler)    raster at rasterman.com
>