Introduction and updates from NVIDIA

Mon Apr 4 15:27:56 UTC 2016

Hi,

On 2 April 2016 at 01:28, Andy Ritger <aritger at nvidia.com> wrote:
> On Tue, Mar 29, 2016 at 05:44:41PM +0100, Daniel Stone wrote:
>> On 23 March 2016 at 00:12, Andy Ritger <aritger at nvidia.com> wrote:
>> > Also, mailbox mode versus FIFO mode should essentially equate to Vsync
>> > off versus Vsync on, respectively.  It shouldn't have anything to do
>> > with the benefits of streams, but mailbox mode is a nice feature for
>> > benchmarking games/simulations or naively displaying your latest &
>> > greatest content without tearing.
>>
>> I agree it's definitely a nice thing to have, but it does bring up the
>> serialisation issue: we expect any configuration performed by the
>> client (say, wl_surface::set_opaque_area to let the compositor know
>> where it can disable blending) to be fully in-line with buffer
>> attachment. The extreme case of this is resize, but there are quite a
>> few valid cases where you need serialisation.
>>
>> I don't know quite off the top of my head how you'd support mailbox
>> mode with Streams, given this constraint - you need three-way feedback
>> between the compositor (recording all associated surface state,
>> including subsurfaces), clients (recording the surface state valid
>> when that buffer was posted), and the Streams implementation
>> (determining which frames to dequeue, which to discard and return to
>> the client, etc).
>
> It is possible we don't get that all completely right in our implementation, yet.

Again this comes down to the synchronisation. In this case, assuming a
mailbox stream:
  - wl_egl_surface_resize(w1, h1)
  - gl*()
  - eglSwapBuffers() <- commit 1
  - wl_egl_surface_resize(w2, h2)
  - gl*()
  - eglSwapBuffers() <- commit 2

For this, you would need some kind of synchronisation, to ensure that
processing commit 1 didn't pick up on the differently-sized frames for
commit 2.

>> Right, atomic allows you separate pipe/CRTC configuration from
>> plane/overlay configuration. So you'd have two options: one is to use
>> atomic and require the CRTC be configured with planes off before using
>> Streams to post flips, and the other is to add KMS configuration to
>> the EGL output.
>
> Yes, I think those are the two general directions, though neither
> are great.  It seems like you'd want a way to express the EGLStream to
> use in a plane of a KMS configuration, to be latched on a subsequent
> KMS atomic request.  But, one API bleeding into the other, in either
> direction, gets ugly.
>
>> Though, now I think of it, this effectively precludes one case, which
>> is scaling a Streams-sourced buffer inside the display controller. In
>> the GBM case, the compositor gets every buffer, so can configure the
>> plane scaling in line with buffer display. I don't see how you'd do
>> that with Streams.
>
> Agreed.  I think we'd need something like I described above in order to
> solve that within the context of EGLStreams.

Hm, so you'd effectively want to hand an atomic-KMS request object to
Streams, requesting that it stage its current state into that request.
The pending state is private ABI for libdrm, so doing post-hoc
rewrites wouldn't really work.

One detail which comes to mind: our assign_planes hook is what's
responsible for scanning the scene graph and pulling things out into
planes. We do a test request for each plane, to iteratively determine
(via trial and error) which scanout-candidate buffers we can and can't
hoist into planes. This can fail for any number of reasons (exceeded
global bandwidth limits, run out of shared scaler/detiling units, too
many planes on a single scanline, etc etc), so one key requirement we
have is that this fail gracefully and fall back to EGLImage
composition.

Would this work without client intervention, i.e. one buffer used in a
(failed) kernel request and then subsequently used for GPU
composition?

>> I'd argue that synchronisation (in terms of serialisation with the
>> rest of the client's protocol stream) is missing from Streams as well,
>> at least in mailbox mode.
>>
>> (As an aside, I wonder if it's properly done in FIFO mode as well; the
>> compositor may very validly choose not to dequeue a buffer if a
>> surface is completely occluded. How does Streams then know that it can
>> submit another frame? Generally we use wl_surface::frame to deal with
>> this - the equivalent of eglSwapInterval(1) - but it sounds like
>> Streams relies more on strictly-paired internal queue/dequeue pairing
>> in FIFO mode. Maybe this isn't true.)
>
> Right: in the case that the compositor wants to drop a frame, it would
> need to dequeue it from the FIFO if it wants the client to be able to
> produce a new frame.  Otherwise, as I understand it, the client would
> block in its next call to eglSwapBuffers().

Right, this should be doable with the existing attach hooks. I had
some concerns about subsurface commits, but am not sure they hold up.
Either way, they're fixable with Weston.

>> > Maybe I'm not looking in the right place, but where does gbm_surface get
>> > the intended plane configuration?  Are there other display-related flags
>> > beside GBM_BO_USE_SCANOUT?  Then again, the particular plane doesn't
>> > impact us for current GPUs.
>>
>> Well, nowhere. By current plane configuration, I assume you're (to the
>> extent that you can discuss it) talking about asymmetric plane
>> capabilities, e.g. support for disjoint colour formats, scaling units,
>> etc? As Dan V says, I still see Streams as a rather incomplete fix to
>> this, given that plane assignment is pre-determined: what do you do
>> when your buffers are configured as optimally as possible, but the
>> compositor has picked the 'wrong' plane? I really think you need
>> something like HWC to rewrite your scene graph into the optimal setup.
>
> Yes, encapsulating the composition within something more like HWC would
> be ideal to allow for optimal use of planes.
>
> My questions above were prompted by your statement that "a gbm_surface
> contains information as to how the plane... will be configured."  Maybe I
> misunderstood what you meant by that.

Oh right: I was just talking about basic dimensions and format. Sorry
for the confusion. Which other attributes would you like to see? I
guess scaling is a fairly obvious one.

>> > I think a lot of the concern is about passing client-produced frames
>> > all the way through to scanout (i.e., zero-copy). E.g., if the wayland
>> > client is producing frames that the wayland compositor is going to use
>> > as a texture, then we don't want the client to decompress as part of its
>> > eglSwapBuffers: the wayland compositor will texture from the compressed
>> > frame for best performance.  But, if the wayland compositor is going to
>> > flip to the surface, then we would want the client to decompress during
>> > its eglSwapBuffers.
>>
>> Yes, very much so. Taking the Freescale example, you want the client
>> to do a detiling blit during its swap if the surface is a valid
>> scanout target, but not at all if it's just getting textured by the
>> GPU anyway. Similarly, Intel wants to allocate X-tiled if scanout is
>> possible, but otherwise it wants to be Y/Yf/...-tiled.
>
> That is good to know.  How are those decisions made today?

The dumbest way possible: Intel and AMD drivers just force all winsys
buffers to be scanout-compatible, partly as a hangover from X11 where
it was a lot more complicated to schedule composition. Freescale is
as-yet unresolved, but I believe it goes the opposite direction and
never aims for scanout-compatible buffers, except when sat directly on
top of GBM. Something I've been hoping to get to, but endlessly
pre-empted.

I agree it's a massive issue though and something we need to get fixed properly.

>> I believe this is entirely doable with GBM right now, taking advantage
>> of the fact that libgbm.so and libEGL.so must be as tightly paired as
>> libEGL.so and libGLESv2.so. For all of these, read 'wl_drm' as 'wl_drm
>> or its equivalent interface in other implementations'.
>>
>> Firstly, create a new interface in wl_drm to represent a swapchain (in
>> the Vulkan sense), and modify its buffer-creation requests to take a
>> swapchain parameter. This we can do without penalty, since the only
>> users (aside from VA-API, which is really broken and also hopefully
>> soon to lose its Wayland sink anyway) are EGL_EXT_platform_wayland and
>> EGL_WL_bind_wayland_display, both within the same DSO.
>>
>> Secondly, instrument gbm_bo_import's wl_buffer path (proxy for intent
>> to use a buffer for direct scanout) and EGLImage's
>> EGL_WAYLAND_BUFFER_WL path (proxy for intent to use via GPU
>> composition) to determine what the compositor is actually doing with
>> these buffers, and use that to store target/intent in the swapchain.
>>
>> Thirdly, when the target/intent changes (e.g. 'was scanout every
>> frame, has been EGLImage for the last 120 frames'), send an event down
>> to the client to let it know to modify its allocation. The combination
>> of EGL/GBM are in the correct place to determine this, since between
>> them they already have to know the intersection of capabilities
>> between render and scanout.
>
> Thanks.  The suggestion in the second step is particularly interesting.
> I haven't tried to poke any holes in the proxy-for-intent cases, yet.
> Do you think those inferences are reliable?

Reliable-ish. The gbm_bo_import part is entirely reliable, since that
does only get called in assign_planes, when we've determined that we
would like to use that view as a scanout target. EGLImages will always
be created at attach time, so that's not a determination of intent,
_but_ as the configuration can change at any time without the client
posting new buffers, we do need the buffer to be EGL/GLES-compatible
as our lowest common denominator anyway, so.

(All of the above that I'm discussing is specific to Weston. Mutter
does not support composition bypass due to internal architectural
issues - its deep tie to Clutter's scene graph, Enlightenment are
still working heavily on their KMS backend and haven't got to that
point yet, and I'm not sure KWin does either.)

>> That still doesn't solve the optimal-display-configuration problem -
>> that you have generic code determining not only the display strategy
>> (scanout vs. GPU composition) as well as the exact display controller
>> configuration - but neither does EGLStreams, or indeed anything
>> current short of HWC.
>>
>> Do you see any problem with doing that within GBM? It's not actually
>> done yet, but then again, neither is direct scanout through Streams.
>> ;)
>
> This definitely seems worth exploring.

Great! Let me know if I can be of any use, if you do end up exploring
this angle.

>> Might also be worth striking a common misconception here: the Mesa GBM
>> implementation is _not_ canonical. gbm.h is the user-facing API you
>> have to implement, but beyond that, you don't need to be implemented
>> by Mesa's src/gbm/. As the gbm.h types are all opaque, I'm not sure
>> what you couldn't express/hide/store - do you have any examples?
>
> Good points.  No, I don't have any examples off hand of things that
> couldn't be encapsulated within that.
>
> I agree that the Mesa GBM implementation is not canonical.  Though, it
> would be nice to avoid libgbm.so collisions.

Oh, yes. We should probably avoid creating new glvnd-type issues for
ourselves, yes ...

> Let me know if I should
> ask this separately on, e.g., mesa-dev, but would it be reasonable to
> treat Mesa's libgbm as the "vendor neutral" library?  It looks like
> there are currently two opportunities to load into libgbm:
>
> (a) Load as a "backend" DSO (i.e., get loaded by
>     mesa/src/gbm/main/backend.c:_gbm_create_device()).
>
> (b) Load as a DRI driver by the DRI libgbm backend (i.e., get loaded
>     by mesa/src/gbm/backends/dri/gbm_dri.c).
>
> For purposes of vendor-specific opaque data, it looks like (a) would
> make the most sense.  However, (b) currently conveniently infers a DSO
> name to load, by querying the name of the DRM driver that corresponds
> to the provided fd.  Maybe it would make sense to hoist some of that
> inference logic from (b) to (a)?  It probably also depends on which of
> (a) or (b) we'd consider a stabler ABI?

Yes, I'd suggest that a would be the better way to go, with
backend/loader logic pulled up as appropriate. egl_dri ties you quite
heavily into __DRIscreen and __DRIimage interfaces, which get you an
alarming amount of the way towards having a full Mesa driver. I guess
if that's what you guys want to do, then great, but short of that
having your own GBM backend would definitely make the most sense.

Considering that, I'd suggest hoisting the non-gbm_dri parts of GBM
out of Mesa and into a separate repository, and trying to get minigbm
built as a GBM backend as well.

Cheers,
Daniel