Introduction and updates from NVIDIA

Tue Mar 29 16:44:41 UTC 2016

Hi Andy,

On 23 March 2016 at 00:12, Andy Ritger <aritger at nvidia.com> wrote:
> Thanks for the thorough responses, Daniel.

No problem; as I said, I'm actually really happy to see an
implementation out there.

> On Tue, Mar 22, 2016 at 01:49:59PM +0000, Daniel Stone wrote:
>> On 21 March 2016 at 16:28, Miguel Angel Vico <mvicomoya at nvidia.com> wrote:
>> >     Similarly, EGLOutput will provide means to access different
>> >     portions of display control hardware associated with an EGLDevice.
>> >
>> >     For instance, EGLOutputLayer represents a portion of display
>> >     control hardware that accepts an image as input and processes it
>> >     for presentation on a display device.
>>
>> I still struggle to see the value of what is essentially an
>> abstraction over KMS, but oh well.
>
> The intent wasn't to abstract all of KMS, just the surface presentation
> aspect where EGL and KMS intersect.  Besides the other points below,
> an additional motivation for abstraction is to allow EGL to work with
> the native modesetting APIs on other platforms (e.g., OpenWF on QNX).

Fair enough. And, ah, _that's_ where the OpenWF implementation is - I
was honestly unsure for years since the last implementation I saw was
from the ex-Hybrid NVIDIA guys in Helsinki, back when it was aimed at
Series 60.

>> Firstly, again looking at the case where a Wayland client is a stream
>> producer and the Wayland compositor is a consumer, we move from a
>> model where references to individual buffers are explicitly passed
>> through the Wayland protocol, to where those buffers merely carry a
>> reference to a stream. Again, as stated in the review of 4/7, that
>> looks like it has the potential to break some actual real-world cases,
>> and I have no idea how to solve it, other than banning mailbox mode,
>> which would seem to mostly defeat the point of Streams (more on that
>> below).
>
> Streams are just a transport for frames.  The client still explicitly
> communicates when a frame is delivered through the stream via wayland
> protocol, and the compositor controls when it grabs a new frame, via
> eglStreamConsumerAcquireKHR().  Unless there are bugs in the patches,
> the flow of buffers is still explicit and fully under the wayland protocol
> and compositor's control.

Right, I believe if you have FIFO mode and strictly enforce
synchronisation to wl_surface::frame, then you should be safe. Mailbox
mode or any other kind of SwapInterval(0) equivalent opens you up to a
series of issues.

> Also, mailbox mode versus FIFO mode should essentially equate to Vsync
> off versus Vsync on, respectively.  It shouldn't have anything to do
> with the benefits of streams, but mailbox mode is a nice feature for
> benchmarking games/simulations or naively displaying your latest &
> greatest content without tearing.

I agree it's definitely a nice thing to have, but it does bring up the
serialisation issue: we expect any configuration performed by the
client (say, wl_surface::set_opaque_area to let the compositor know
where it can disable blending) to be fully in-line with buffer
attachment. The extreme case of this is resize, but there are quite a
few valid cases where you need serialisation.

I don't know quite off the top of my head how you'd support mailbox
mode with Streams, given this constraint - you need three-way feedback
between the compositor (recording all associated surface state,
including subsurfaces), clients (recording the surface state valid
when that buffer was posted), and the Streams implementation
(determining which frames to dequeue, which to discard and return to
the client, etc).

>> Secondly, looking at the compositor-drm case, the use of the dumb
>> buffer to display undefined content as a dummy modeset really makes me
>> uneasy,
>
> Yes, the use of dumb buffer in this patch series is a kludge.  If we
> were going to use drmModeSetCrtc + EGLStreams, I think we'd want to
> pass no fb to drmModeSetCrtc, but that currently gets rejected by DRM.
> Are surface-less modesets intended to be allowable in DRM?  I can hunt
> that down if that is intended to work.  Of course, better to work out
> how EGLStreams should cooperate with atomic KMS.
>
> It was definitely an oversight to not zero initialize the dumb buffer.

Right, atomic allows you separate pipe/CRTC configuration from
plane/overlay configuration. So you'd have two options: one is to use
atomic and require the CRTC be configured with planes off before using
Streams to post flips, and the other is to add KMS configuration to
the EGL output.

Though, now I think of it, this effectively precludes one case, which
is scaling a Streams-sourced buffer inside the display controller. In
the GBM case, the compositor gets every buffer, so can configure the
plane scaling in line with buffer display. I don't see how you'd do
that with Streams.

There's another hurdle to overcome too, which would currently preclude
avoiding the intermediate dumb buffer at all. One of the invariants
the atomic KMS API enforces is that (!!plane->crtc_id ==
!!plane->fb_id), i.e. that a plane cannot be assigned to a CRTC
without an active buffer. So again, we're left with either having the
plane fully configured and active (assigned to a CRTC and displaying,
I assume, a pre-allocated dumb buffer), or pushing more configuration
into Streams - specifically, connecting an EGLOutputLayer to an
EGLOutputPort.

>> Also, I'm not quite sure how you're testing the compositor-as-consumer
>> mode: I can't seem to see any EGL extensions which allow you to
>> connect a Wayland surface as an EGLStream consumer. Do you have
>> something else unpublished that's being used here, or is this what the
>> libnvidia-egl-wayland library is for? Or do you just have clients
>> using EGLSurfaces as normal, which happen to be implemented internally
>> as EGLStreams? (Also, that the only way to test this is through
>> proprietary drivers implementing only-just-published extensions not
>> only makes me very sad, but hugely increases the potential for this to
>> be inadvertently broken.)
>
> Sorry if this seemed cryptic.  You are correct that EGL Wayland clients
> just use EGLSurfaces as normal (no Wayland client changes), and that
> gets implemented using EGLStreams within libnvidia-egl-wayland.

Sorry, I'd missed this whilst reading through.

> FWIW, we plan to release the source to libnvidia-egl-wayland
> eventually... it has a few driver-specific warts right now, but the
> intent is that it is a vendor-independent implementation (though, using
> EGLStreams, so...) of EGL_KHR_platform_wayland using a set of EGL API
> "wrappers".  The goal was to allow window systems to write these EGL
> platform binding themselves, so that each EGL implementation doesn't
> have to implement each EGL_KHR_platform_*.  Anyway, we'll try to get
> libnvidia-egl-wayland cleaned up and released.

Interesting!

>> >     Thus, a compositor could produce frames and feed them to an
>> >     EGLOutputLayer through an EGLStream for presentation on a display
>> >     device.
>> >
>> >     In a similar way, by attaching a GLTexture consumer to a stream, a
>> >     producer (wayland client) could feed frames to a texture, which in
>> >     turn can be used by a compositor to prepare the final frame to be
>> >     presented.
>>
>> Quick aside: this reminds me in many unfortunate ways of
>> GLX_EXT_texture_from_pixmap. tfp gave us the same 'capture stream of
>> stuff and make it appear in a texture' model as streams, whereas most
>> of the rest of the world (EGL, Vulkan WSI, Wayland, Android, ChromeOS,
>> etc) have all moved explicitly _away_ from that model to passing
>> references to individual buffers, this in many ways brings us back to
>> tfp.
>
> Is that really an accurate comparison?  The texture_from_pixmap extension
> let X11 composite managers bind a single X pixmap to an OpenGL texture.
> It seems to me what was missing in TFP usage was explicit synchronization
> between X and/or OpenGL rendering into the pixmap and OpenGL texturing
> from the pixmap.

I'd argue that synchronisation (in terms of serialisation with the
rest of the client's protocol stream) is missing from Streams as well,
at least in mailbox mode.

(As an aside, I wonder if it's properly done in FIFO mode as well; the
compositor may very validly choose not to dequeue a buffer if a
surface is completely occluded. How does Streams then know that it can
submit another frame? Generally we use wl_surface::frame to deal with
this - the equivalent of eglSwapInterval(1) - but it sounds like
Streams relies more on strictly-paired internal queue/dequeue pairing
in FIFO mode. Maybe this isn't true.)

>> >     Whenever EGL_EXT_device_drm extension is present, EGLDevice can
>> >     be used to enumerate and access DRM KMS devices, and EGLOutputLayer
>> >     to enumerate and access DRM KMS crtcs and planes.
>>
>> Again, the enumeration isn't so much used as bypassed. The original
>> enumeration is used, and all we do with the EGL objects is a) list all
>> of them, b) filter them to find the one we already have, and c)
>> perhaps replace their internal representation of the device with the
>> one we already have.
>
> That's fair in the context of this patch set.
>
> In general, EGLDevice provides device enumeration for other use cases
> where it is the basis for bootstrapping.  Maybe we could better reconcile
> udev and EGLDevice in the patch set, but some of this is a natural, though
> unfortunate, artifact of correlating objects between two enumeration APIs.

Mind you, this wasn't intended as a criticism, just noting that the
commit message didn't accurately describe the code.

>> I'd like to look at the elephant in the room, which is why you're
>> using this in the first place (aside from general NVIDIA enthusiasm
>> for encapsulating everything within EGL Streams/Output/Device/etc,
>> dating back many years). Andy/Aaron, you've said that you found GBM to
>> be inadequate, and I'd like to find out explicitly how.
>
> Thanks.  This is the real heart of the debate.

Yes!

>> Through a few
>> snippets of IRC and NVIDIA devtalk, so far I can see:
>>
>> 'We can't choose an optimal rendering configuration, because we don't
>> know how it's going to be used' - (almost completely) untrue. The FD
>> you pass to gbm_device_create is that of the KMS device, a gbm_surface
>> contains information as to how the plane (primary or overlay) will be
>> configured,
>
> Maybe I'm not looking in the right place, but where does gbm_surface get
> the intended plane configuration?  Are there other display-related flags
> beside GBM_BO_USE_SCANOUT?  Then again, the particular plane doesn't
> impact us for current GPUs.

Well, nowhere. By current plane configuration, I assume you're (to the
extent that you can discuss it) talking about asymmetric plane
capabilities, e.g. support for disjoint colour formats, scaling units,
etc? As Dan V says, I still see Streams as a rather incomplete fix to
this, given that plane assignment is pre-determined: what do you do
when your buffers are configured as optimally as possible, but the
compositor has picked the 'wrong' plane? I really think you need
something like HWC to rewrite your scene graph into the optimal setup.

>> and an EGLDisplay lets you tie the rendering and scanout
>> devices together. What more information do you need? It's true that we
>> don't have a way to select individual rendering devices at the moment,
>> but as said earlier, passing an EGLDevice as an attrib to
>> GetPlatformDisplay would resolve that, as you would have the render
>> device identified by the EGLDevice and the scanout device identified
>> by the gbm_device. At that point, you have the full pipeline and can
>> determine the optimal configuration.
>
> Beyond choosing optimal rendering configuration, there is arbitration of
> the scarce resources needed for optimal rendering configuration.  E.g.,
> for Wayland compositor flipping to client-produced buffers, presumably the
> client's buffer needs to be allocated with GBM_BO_USE_SCANOUT.  NVIDIA's
> display hardware requires physically contiguous buffers, so we wouldn't
> want clients to _always_ allocate buffers with the GBM_BO_USE_SCANOUT
> flag.  It would be nice to have feedback between the EGL driver instance
> in the compositor and the EGL driver running in the client, to know how
> the buffer is going to be used by the Wayland compositor.
>
> I imagine other hardware has even more severe constraints on displayable
> memory, though, so maybe I'm misunderstanding something about how buffers
> are shared between wayland clients and compositors?

Ah! This is something I've very much had in mind - and have had for
quite a while, but keep getting pre-empted - for a while, but didn't
bring up as it didn't seem implemented in the current patchset. (IIRC,
jajones had some code to allow you to retarget Streams at different
consumers, but he's on leave.)

Also, I should add that there's nothing requiring clients to use GBM
to allocate. The client EGLSurface implementation is free to do purely
internal allocations that are only accessible to it, if it wants to;
gbm_bo_import would then note that the buffer is not usable for
scanout and fail the import, leaving the compositor to fall back to
EGLImage.

> This ties into the next point...
>
>> 'We don't know when to schedule decompression, because there's no
>> explicit barrier' - completely untrue. eglSwapBuffers is that barrier.
>> For example, in Freescale i.MX6, the Vivante GPU and Freescale IPU
>> (display controller) do not share a single common format between GPU
>> render targets and IPU scanout sources, so require a mandatory
>> detiling pass in between render and display. These work just fine with
>> gbm with that pass scheduled by eglSwapBuffers. This to me seems
>> completely explicit, unless there was something else you were meaning
>> ... ?
>
> The Vivante+Freescale example is a good one, but it would be more
> interesting if they shared /some/ formats and you could only use those
> common formats in /some/ cases.

That's also fairly common, particularly for tiling. Intel has more
tiling modes than I can remember, of which only one (X-tiling) is a
valid source for scanout. As you say, physical contiguity is also a
valid requirement, plus pitch alignment.

> I think a lot of the concern is about passing client-produced frames
> all the way through to scanout (i.e., zero-copy). E.g., if the wayland
> client is producing frames that the wayland compositor is going to use
> as a texture, then we don't want the client to decompress as part of its
> eglSwapBuffers: the wayland compositor will texture from the compressed
> frame for best performance.  But, if the wayland compositor is going to
> flip to the surface, then we would want the client to decompress during
> its eglSwapBuffers.

Yes, very much so. Taking the Freescale example, you want the client
to do a detiling blit during its swap if the surface is a valid
scanout target, but not at all if it's just getting textured by the
GPU anyway. Similarly, Intel wants to allocate X-tiled if scanout is
possible, but otherwise it wants to be Y/Yf/...-tiled.

> The nice thing about EGLStreams here is that if the consumer (the Wayland
> compositor) wants to use the content in a different way, the producer
> must be notified first, in order to produce something suitable for the
> new consumer.

I believe this is entirely doable with GBM right now, taking advantage
of the fact that libgbm.so and libEGL.so must be as tightly paired as
libEGL.so and libGLESv2.so. For all of these, read 'wl_drm' as 'wl_drm
or its equivalent interface in other implementations'.

Firstly, create a new interface in wl_drm to represent a swapchain (in
the Vulkan sense), and modify its buffer-creation requests to take a
swapchain parameter. This we can do without penalty, since the only
users (aside from VA-API, which is really broken and also hopefully
soon to lose its Wayland sink anyway) are EGL_EXT_platform_wayland and
EGL_WL_bind_wayland_display, both within the same DSO.

Secondly, instrument gbm_bo_import's wl_buffer path (proxy for intent
to use a buffer for direct scanout) and EGLImage's
EGL_WAYLAND_BUFFER_WL path (proxy for intent to use via GPU
composition) to determine what the compositor is actually doing with
these buffers, and use that to store target/intent in the swapchain.

Thirdly, when the target/intent changes (e.g. 'was scanout every
frame, has been EGLImage for the last 120 frames'), send an event down
to the client to let it know to modify its allocation. The combination
of EGL/GBM are in the correct place to determine this, since between
them they already have to know the intersection of capabilities
between render and scanout.

That still doesn't solve the optimal-display-configuration problem -
that you have generic code determining not only the display strategy
(scanout vs. GPU composition) as well as the exact display controller
configuration - but neither does EGLStreams, or indeed anything
current short of HWC.

Do you see any problem with doing that within GBM? It's not actually
done yet, but then again, neither is direct scanout through Streams.
;)

>> 'Width, height, pitch and format aren't enough information' - this is
>> true, but not necessarily relevant. I'm not sure what the source of
>> this actually is: is it the gbm_bo_get_*() APIs? If so, yes, they need
>> to be extended with a gbm_bo_get_modifier() call, which would allow
>> you to get the DRM format modifier to describe tiling/compression/et
>> al (as well as perhaps being extended to allow you to extract multiple
>> buffers/planes, e.g. to attach auxiliary compression buffers). If it's
>> not gbm, what actually is it? The only other place I can think of
>> (suggested by Pekka, I think) was the wl_drm protocol, which it should
>> be stressed is a) not required in any way by Wayland, b) not a
>> published/public protocol, c) not a stable protocol. wl_drm just
>> happens to be the way that Mesa shares buffers, just as wl_viv is how
>> Vivante's proprietary driver shares buffers, and mali_buffer_sharing
>> is how the Mali driver does it. Since the server side is bound by
>> eglBindWaylandDisplayWL and the client side is also only used through
>> EGL, there is _no_ requirement for you to also implement wl_drm. As it
>> is a hidden private Mesa protocol, there is also no requirement for
>> the protocol to remain stable.
>
> I agree that wl_drm doesn't factor into it.
>
> Maybe some of this is my confusion over what parts of gbm.h are
> application-facing, and what parts are driver-facing?  We, and
> presumably most hardware vendors, would want the ability to associate
> arbitrary metadata with gbm_bo's, but most of that metadata is
> implementation-specific, and not really something an application should
> be looking at without sacrificing portability.

All of gbm.h is user-facing; how you implement that API is completely
up to you, including arbitrary metadata. For instance, it's the driver
that allocates its own struct gbm_surface/gbo_bo/etc (which is
opaque), so it can do whatever it likes in terms of metadata. Is there
anything in particular you're thinking of that you're not sure you'd
be able to store portably?

Might also be worth striking a common misconception here: the Mesa GBM
implementation is _not_ canonical. gbm.h is the user-facing API you
have to implement, but beyond that, you don't need to be implemented
by Mesa's src/gbm/. As the gbm.h types are all opaque, I'm not sure
what you couldn't express/hide/store - do you have any examples?

>> 'EGLStreams is the direction taken in Vulkan' - I would argue not. IMO
>> the explicit buffer management on the client side does not parallel
>> EGLStreams, and notably there is no equivalent consumer interface
>> offered on the server side, but instead the individual-buffer-driven
>> approach is taken. It's true that VK_WSI_display_swapchain does exist
>> and does match the EGLStreams model fairly closely, but also that it
>> does not have universal implementation: the Intel 'anv' Mesa-based
>> driver does not implement display_swapchain, instead having an
>> interface to export a VkImage as a dmabuf. It's true that the latter
>> is not optimal (it lacks the explicit targeting required to determine
>> the most optimal tiling/compression strategy), but OTOH it is
>> precedent for explicitly avoiding the
>> VK_WSI_display_swapchain/EGLStreams model for Vulkan on KMS, just as
>> GBM avoids it for EGL on KMS.
>
> From your perspective, what would be more optimal than VkImage+dmabuf?

Well, it's pretty much on par with GBM-compositor-Wayland-client and
an EGLStreams pipeline ending in an EGLOutput. Not having something
like HWC means that you can't determine the optimal plane-allocation
strategy.

On 23 March 2016 at 00:33, Andy Ritger <aritger at nvidia.com> wrote:
> On Tue, Mar 22, 2016 at 09:52:21PM +0000, Daniel Stone wrote:
>> Agreed. One of the things I've been incredibly happy with is how our
>> platform has managed to stay completely generic and vendor-neutral so
>> far, and I'd love to preserve that.
>
> I don't think you'll find any disagreement to that from NVIDIA, either.
>
> I apologize if the EGLStreams proposal gave the impression of a
> vendor-private solution.  That wasn't the intent.  The EGLStream family
> of extensions are, after all, an open specification that any EGL vendor
> can implement.  If there are aspects of any of these EGL extensions that
> seem useful, I'd hope that Mesa would we willing to adopt them.

Indeed, this wasn't to cast any aspersions on how you guys have
developed Streams. Having it out there and having these patches has
really been tremendously useful.

> We (NVIDIA) clearly think EGLStreams is a good direction for expressing
> buffer sharing semantics.  In our ideal world, everyone would implement
> these extensions and Wayland compositors would migrate to using them as
> the generic vendor-neutral mechanism for buffer sharing :)

But here's where my problem lies. At the moment, the 'how do I
Wayland' story is very straightforward, and not entirely
coincidentally similar to ChromeOS's: you implement GBM+KMS, you
implement the ~25 LoC of libwayland-egl, you implement
EGL_EXT_platform_{gbm,wayland}, and ... that's it. Introducing Streams
as an alternate model is certainly interesting, and I understand why
you would do it, but having it as the sole option muddies the 'how do
I Wayland' story significantly.

Getting away from the vendor-bound DDX model was something we were
desperate to do (see also xf86-video-modesetting landing on GBM+EGL),
and I'd really just like to avoid that becoming 'well, for most
platforms you do this, but for this platform / these platforms, you do
this instead ...'.

Cheers,
Daniel