Introduction and updates from NVIDIA

Sat Apr 2 00:28:17 UTC 2016

On Tue, Mar 29, 2016 at 05:44:41PM +0100, Daniel Stone wrote:
> Hi Andy,
> 
> On 23 March 2016 at 00:12, Andy Ritger <aritger at nvidia.com> wrote:
> > Thanks for the thorough responses, Daniel.
> 
> No problem; as I said, I'm actually really happy to see an
> implementation out there.
> 
> > On Tue, Mar 22, 2016 at 01:49:59PM +0000, Daniel Stone wrote:
> >> On 21 March 2016 at 16:28, Miguel Angel Vico <mvicomoya at nvidia.com> wrote:
> >> >     Similarly, EGLOutput will provide means to access different
> >> >     portions of display control hardware associated with an EGLDevice.
> >> >
> >> >     For instance, EGLOutputLayer represents a portion of display
> >> >     control hardware that accepts an image as input and processes it
> >> >     for presentation on a display device.
> >>
> >> I still struggle to see the value of what is essentially an
> >> abstraction over KMS, but oh well.
> >
> > The intent wasn't to abstract all of KMS, just the surface presentation
> > aspect where EGL and KMS intersect.  Besides the other points below,
> > an additional motivation for abstraction is to allow EGL to work with
> > the native modesetting APIs on other platforms (e.g., OpenWF on QNX).
> 
> Fair enough. And, ah, _that's_ where the OpenWF implementation is - I
> was honestly unsure for years since the last implementation I saw was
> from the ex-Hybrid NVIDIA guys in Helsinki, back when it was aimed at
> Series 60.

Yes.  I haven't had any direct interaction with the QNX implementation
of OpenWF.  In any case, portability across OSes has been an important
part of our downstream Wayland efforts in automotive.

> >> Firstly, again looking at the case where a Wayland client is a stream
> >> producer and the Wayland compositor is a consumer, we move from a
> >> model where references to individual buffers are explicitly passed
> >> through the Wayland protocol, to where those buffers merely carry a
> >> reference to a stream. Again, as stated in the review of 4/7, that
> >> looks like it has the potential to break some actual real-world cases,
> >> and I have no idea how to solve it, other than banning mailbox mode,
> >> which would seem to mostly defeat the point of Streams (more on that
> >> below).
> >
> > Streams are just a transport for frames.  The client still explicitly
> > communicates when a frame is delivered through the stream via wayland
> > protocol, and the compositor controls when it grabs a new frame, via
> > eglStreamConsumerAcquireKHR().  Unless there are bugs in the patches,
> > the flow of buffers is still explicit and fully under the wayland protocol
> > and compositor's control.
> 
> Right, I believe if you have FIFO mode and strictly enforce
> synchronisation to wl_surface::frame, then you should be safe. Mailbox
> mode or any other kind of SwapInterval(0) equivalent opens you up to a
> series of issues.
> 
> > Also, mailbox mode versus FIFO mode should essentially equate to Vsync
> > off versus Vsync on, respectively.  It shouldn't have anything to do
> > with the benefits of streams, but mailbox mode is a nice feature for
> > benchmarking games/simulations or naively displaying your latest &
> > greatest content without tearing.
> 
> I agree it's definitely a nice thing to have, but it does bring up the
> serialisation issue: we expect any configuration performed by the
> client (say, wl_surface::set_opaque_area to let the compositor know
> where it can disable blending) to be fully in-line with buffer
> attachment. The extreme case of this is resize, but there are quite a
> few valid cases where you need serialisation.
> 
> I don't know quite off the top of my head how you'd support mailbox
> mode with Streams, given this constraint - you need three-way feedback
> between the compositor (recording all associated surface state,
> including subsurfaces), clients (recording the surface state valid
> when that buffer was posted), and the Streams implementation
> (determining which frames to dequeue, which to discard and return to
> the client, etc).

It is possible we don't get that all completely right in our implementation, yet.

> >> Secondly, looking at the compositor-drm case, the use of the dumb
> >> buffer to display undefined content as a dummy modeset really makes me
> >> uneasy,
> >
> > Yes, the use of dumb buffer in this patch series is a kludge.  If we
> > were going to use drmModeSetCrtc + EGLStreams, I think we'd want to
> > pass no fb to drmModeSetCrtc, but that currently gets rejected by DRM.
> > Are surface-less modesets intended to be allowable in DRM?  I can hunt
> > that down if that is intended to work.  Of course, better to work out
> > how EGLStreams should cooperate with atomic KMS.
> >
> > It was definitely an oversight to not zero initialize the dumb buffer.
> 
> Right, atomic allows you separate pipe/CRTC configuration from
> plane/overlay configuration. So you'd have two options: one is to use
> atomic and require the CRTC be configured with planes off before using
> Streams to post flips, and the other is to add KMS configuration to
> the EGL output.

Yes, I think those are the two general directions, though neither
are great.  It seems like you'd want a way to express the EGLStream to
use in a plane of a KMS configuration, to be latched on a subsequent
KMS atomic request.  But, one API bleeding into the other, in either
direction, gets ugly.

> Though, now I think of it, this effectively precludes one case, which
> is scaling a Streams-sourced buffer inside the display controller. In
> the GBM case, the compositor gets every buffer, so can configure the
> plane scaling in line with buffer display. I don't see how you'd do
> that with Streams.

Agreed.  I think we'd need something like I described above in order to
solve that within the context of EGLStreams.

> There's another hurdle to overcome too, which would currently preclude
> avoiding the intermediate dumb buffer at all. One of the invariants
> the atomic KMS API enforces is that (!!plane->crtc_id ==
> !!plane->fb_id), i.e. that a plane cannot be assigned to a CRTC
> without an active buffer. So again, we're left with either having the
> plane fully configured and active (assigned to a CRTC and displaying,
> I assume, a pre-allocated dumb buffer), or pushing more configuration
> into Streams - specifically, connecting an EGLOutputLayer to an
> EGLOutputPort.

Agreed.

> >> Also, I'm not quite sure how you're testing the compositor-as-consumer
> >> mode: I can't seem to see any EGL extensions which allow you to
> >> connect a Wayland surface as an EGLStream consumer. Do you have
> >> something else unpublished that's being used here, or is this what the
> >> libnvidia-egl-wayland library is for? Or do you just have clients
> >> using EGLSurfaces as normal, which happen to be implemented internally
> >> as EGLStreams? (Also, that the only way to test this is through
> >> proprietary drivers implementing only-just-published extensions not
> >> only makes me very sad, but hugely increases the potential for this to
> >> be inadvertently broken.)
> >
> > Sorry if this seemed cryptic.  You are correct that EGL Wayland clients
> > just use EGLSurfaces as normal (no Wayland client changes), and that
> > gets implemented using EGLStreams within libnvidia-egl-wayland.
> 
> Sorry, I'd missed this whilst reading through.
> 
> > FWIW, we plan to release the source to libnvidia-egl-wayland
> > eventually... it has a few driver-specific warts right now, but the
> > intent is that it is a vendor-independent implementation (though, using
> > EGLStreams, so...) of EGL_KHR_platform_wayland using a set of EGL API
> > "wrappers".  The goal was to allow window systems to write these EGL
> > platform binding themselves, so that each EGL implementation doesn't
> > have to implement each EGL_KHR_platform_*.  Anyway, we'll try to get
> > libnvidia-egl-wayland cleaned up and released.
> 
> Interesting!
> 
> >> >     Thus, a compositor could produce frames and feed them to an
> >> >     EGLOutputLayer through an EGLStream for presentation on a display
> >> >     device.
> >> >
> >> >     In a similar way, by attaching a GLTexture consumer to a stream, a
> >> >     producer (wayland client) could feed frames to a texture, which in
> >> >     turn can be used by a compositor to prepare the final frame to be
> >> >     presented.
> >>
> >> Quick aside: this reminds me in many unfortunate ways of
> >> GLX_EXT_texture_from_pixmap. tfp gave us the same 'capture stream of
> >> stuff and make it appear in a texture' model as streams, whereas most
> >> of the rest of the world (EGL, Vulkan WSI, Wayland, Android, ChromeOS,
> >> etc) have all moved explicitly _away_ from that model to passing
> >> references to individual buffers, this in many ways brings us back to
> >> tfp.
> >
> > Is that really an accurate comparison?  The texture_from_pixmap extension
> > let X11 composite managers bind a single X pixmap to an OpenGL texture.
> > It seems to me what was missing in TFP usage was explicit synchronization
> > between X and/or OpenGL rendering into the pixmap and OpenGL texturing
> > from the pixmap.
> 
> I'd argue that synchronisation (in terms of serialisation with the
> rest of the client's protocol stream) is missing from Streams as well,
> at least in mailbox mode.
> 
> (As an aside, I wonder if it's properly done in FIFO mode as well; the
> compositor may very validly choose not to dequeue a buffer if a
> surface is completely occluded. How does Streams then know that it can
> submit another frame? Generally we use wl_surface::frame to deal with
> this - the equivalent of eglSwapInterval(1) - but it sounds like
> Streams relies more on strictly-paired internal queue/dequeue pairing
> in FIFO mode. Maybe this isn't true.)

Right: in the case that the compositor wants to drop a frame, it would
need to dequeue it from the FIFO if it wants the client to be able to
produce a new frame.  Otherwise, as I understand it, the client would
block in its next call to eglSwapBuffers().

> >> >     Whenever EGL_EXT_device_drm extension is present, EGLDevice can
> >> >     be used to enumerate and access DRM KMS devices, and EGLOutputLayer
> >> >     to enumerate and access DRM KMS crtcs and planes.
> >>
> >> Again, the enumeration isn't so much used as bypassed. The original
> >> enumeration is used, and all we do with the EGL objects is a) list all
> >> of them, b) filter them to find the one we already have, and c)
> >> perhaps replace their internal representation of the device with the
> >> one we already have.
> >
> > That's fair in the context of this patch set.
> >
> > In general, EGLDevice provides device enumeration for other use cases
> > where it is the basis for bootstrapping.  Maybe we could better reconcile
> > udev and EGLDevice in the patch set, but some of this is a natural, though
> > unfortunate, artifact of correlating objects between two enumeration APIs.
> 
> Mind you, this wasn't intended as a criticism, just noting that the
> commit message didn't accurately describe the code.

Got it; thanks.

> >> I'd like to look at the elephant in the room, which is why you're
> >> using this in the first place (aside from general NVIDIA enthusiasm
> >> for encapsulating everything within EGL Streams/Output/Device/etc,
> >> dating back many years). Andy/Aaron, you've said that you found GBM to
> >> be inadequate, and I'd like to find out explicitly how.
> >
> > Thanks.  This is the real heart of the debate.
> 
> Yes!
> 
> >> Through a few
> >> snippets of IRC and NVIDIA devtalk, so far I can see:
> >>
> >> 'We can't choose an optimal rendering configuration, because we don't
> >> know how it's going to be used' - (almost completely) untrue. The FD
> >> you pass to gbm_device_create is that of the KMS device, a gbm_surface
> >> contains information as to how the plane (primary or overlay) will be
> >> configured,
> >
> > Maybe I'm not looking in the right place, but where does gbm_surface get
> > the intended plane configuration?  Are there other display-related flags
> > beside GBM_BO_USE_SCANOUT?  Then again, the particular plane doesn't
> > impact us for current GPUs.
> 
> Well, nowhere. By current plane configuration, I assume you're (to the
> extent that you can discuss it) talking about asymmetric plane
> capabilities, e.g. support for disjoint colour formats, scaling units,
> etc? As Dan V says, I still see Streams as a rather incomplete fix to
> this, given that plane assignment is pre-determined: what do you do
> when your buffers are configured as optimally as possible, but the
> compositor has picked the 'wrong' plane? I really think you need
> something like HWC to rewrite your scene graph into the optimal setup.

Yes, encapsulating the composition within something more like HWC would
be ideal to allow for optimal use of planes.

My questions above were prompted by your statement that "a gbm_surface
contains information as to how the plane... will be configured."  Maybe I
misunderstood what you meant by that.

In any case, I didn't mean to imply that EGLStreams arbitrates planes.
Rather, the intent with EGLStreams is to allow the driver to make the
most intelligent buffer allocation decisions, given the compositor's
current plane configuration.

> >> and an EGLDisplay lets you tie the rendering and scanout
> >> devices together. What more information do you need? It's true that we
> >> don't have a way to select individual rendering devices at the moment,
> >> but as said earlier, passing an EGLDevice as an attrib to
> >> GetPlatformDisplay would resolve that, as you would have the render
> >> device identified by the EGLDevice and the scanout device identified
> >> by the gbm_device. At that point, you have the full pipeline and can
> >> determine the optimal configuration.
> >
> > Beyond choosing optimal rendering configuration, there is arbitration of
> > the scarce resources needed for optimal rendering configuration.  E.g.,
> > for Wayland compositor flipping to client-produced buffers, presumably the
> > client's buffer needs to be allocated with GBM_BO_USE_SCANOUT.  NVIDIA's
> > display hardware requires physically contiguous buffers, so we wouldn't
> > want clients to _always_ allocate buffers with the GBM_BO_USE_SCANOUT
> > flag.  It would be nice to have feedback between the EGL driver instance
> > in the compositor and the EGL driver running in the client, to know how
> > the buffer is going to be used by the Wayland compositor.
> >
> > I imagine other hardware has even more severe constraints on displayable
> > memory, though, so maybe I'm misunderstanding something about how buffers
> > are shared between wayland clients and compositors?
> 
> Ah! This is something I've very much had in mind - and have had for
> quite a while, but keep getting pre-empted - for a while, but didn't
> bring up as it didn't seem implemented in the current patchset.

It was abstracted too well :)

I think you spelled it out below, but I'd love to hear any other thoughts
you have for the right direction on this class of resource arbitration.

> (IIRC,
> jajones had some code to allow you to retarget Streams at different
> consumers, but he's on leave.)

Yes, I think James had some draft specs related to this.  He'll be back
from paternity leave fairly soon; it will be good to get him involved
in this discussion.

> Also, I should add that there's nothing requiring clients to use GBM
> to allocate. The client EGLSurface implementation is free to do purely
> internal allocations that are only accessible to it, if it wants to;
> gbm_bo_import would then note that the buffer is not usable for
> scanout and fail the import, leaving the compositor to fall back to
> EGLImage.

Got it.

> > This ties into the next point...
> >
> >> 'We don't know when to schedule decompression, because there's no
> >> explicit barrier' - completely untrue. eglSwapBuffers is that barrier.
> >> For example, in Freescale i.MX6, the Vivante GPU and Freescale IPU
> >> (display controller) do not share a single common format between GPU
> >> render targets and IPU scanout sources, so require a mandatory
> >> detiling pass in between render and display. These work just fine with
> >> gbm with that pass scheduled by eglSwapBuffers. This to me seems
> >> completely explicit, unless there was something else you were meaning
> >> ... ?
> >
> > The Vivante+Freescale example is a good one, but it would be more
> > interesting if they shared /some/ formats and you could only use those
> > common formats in /some/ cases.
> 
> That's also fairly common, particularly for tiling. Intel has more
> tiling modes than I can remember, of which only one (X-tiling) is a
> valid source for scanout. As you say, physical contiguity is also a
> valid requirement, plus pitch alignment.
> 
> > I think a lot of the concern is about passing client-produced frames
> > all the way through to scanout (i.e., zero-copy). E.g., if the wayland
> > client is producing frames that the wayland compositor is going to use
> > as a texture, then we don't want the client to decompress as part of its
> > eglSwapBuffers: the wayland compositor will texture from the compressed
> > frame for best performance.  But, if the wayland compositor is going to
> > flip to the surface, then we would want the client to decompress during
> > its eglSwapBuffers.
> 
> Yes, very much so. Taking the Freescale example, you want the client
> to do a detiling blit during its swap if the surface is a valid
> scanout target, but not at all if it's just getting textured by the
> GPU anyway. Similarly, Intel wants to allocate X-tiled if scanout is
> possible, but otherwise it wants to be Y/Yf/...-tiled.

That is good to know.  How are those decisions made today?

> > The nice thing about EGLStreams here is that if the consumer (the Wayland
> > compositor) wants to use the content in a different way, the producer
> > must be notified first, in order to produce something suitable for the
> > new consumer.
> 
> I believe this is entirely doable with GBM right now, taking advantage
> of the fact that libgbm.so and libEGL.so must be as tightly paired as
> libEGL.so and libGLESv2.so. For all of these, read 'wl_drm' as 'wl_drm
> or its equivalent interface in other implementations'.
> 
> Firstly, create a new interface in wl_drm to represent a swapchain (in
> the Vulkan sense), and modify its buffer-creation requests to take a
> swapchain parameter. This we can do without penalty, since the only
> users (aside from VA-API, which is really broken and also hopefully
> soon to lose its Wayland sink anyway) are EGL_EXT_platform_wayland and
> EGL_WL_bind_wayland_display, both within the same DSO.
> 
> Secondly, instrument gbm_bo_import's wl_buffer path (proxy for intent
> to use a buffer for direct scanout) and EGLImage's
> EGL_WAYLAND_BUFFER_WL path (proxy for intent to use via GPU
> composition) to determine what the compositor is actually doing with
> these buffers, and use that to store target/intent in the swapchain.
> 
> Thirdly, when the target/intent changes (e.g. 'was scanout every
> frame, has been EGLImage for the last 120 frames'), send an event down
> to the client to let it know to modify its allocation. The combination
> of EGL/GBM are in the correct place to determine this, since between
> them they already have to know the intersection of capabilities
> between render and scanout.

Thanks.  The suggestion in the second step is particularly interesting.
I haven't tried to poke any holes in the proxy-for-intent cases, yet.
Do you think those inferences are reliable?

> That still doesn't solve the optimal-display-configuration problem -
> that you have generic code determining not only the display strategy
> (scanout vs. GPU composition) as well as the exact display controller
> configuration - but neither does EGLStreams, or indeed anything
> current short of HWC.
> 
> Do you see any problem with doing that within GBM? It's not actually
> done yet, but then again, neither is direct scanout through Streams.
> ;)

This definitely seems worth exploring.

> >> 'Width, height, pitch and format aren't enough information' - this is
> >> true, but not necessarily relevant. I'm not sure what the source of
> >> this actually is: is it the gbm_bo_get_*() APIs? If so, yes, they need
> >> to be extended with a gbm_bo_get_modifier() call, which would allow
> >> you to get the DRM format modifier to describe tiling/compression/et
> >> al (as well as perhaps being extended to allow you to extract multiple
> >> buffers/planes, e.g. to attach auxiliary compression buffers). If it's
> >> not gbm, what actually is it? The only other place I can think of
> >> (suggested by Pekka, I think) was the wl_drm protocol, which it should
> >> be stressed is a) not required in any way by Wayland, b) not a
> >> published/public protocol, c) not a stable protocol. wl_drm just
> >> happens to be the way that Mesa shares buffers, just as wl_viv is how
> >> Vivante's proprietary driver shares buffers, and mali_buffer_sharing
> >> is how the Mali driver does it. Since the server side is bound by
> >> eglBindWaylandDisplayWL and the client side is also only used through
> >> EGL, there is _no_ requirement for you to also implement wl_drm. As it
> >> is a hidden private Mesa protocol, there is also no requirement for
> >> the protocol to remain stable.
> >
> > I agree that wl_drm doesn't factor into it.
> >
> > Maybe some of this is my confusion over what parts of gbm.h are
> > application-facing, and what parts are driver-facing?  We, and
> > presumably most hardware vendors, would want the ability to associate
> > arbitrary metadata with gbm_bo's, but most of that metadata is
> > implementation-specific, and not really something an application should
> > be looking at without sacrificing portability.
> 
> All of gbm.h is user-facing; how you implement that API is completely
> up to you, including arbitrary metadata. For instance, it's the driver
> that allocates its own struct gbm_surface/gbo_bo/etc (which is
> opaque), so it can do whatever it likes in terms of metadata. Is there
> anything in particular you're thinking of that you're not sure you'd
> be able to store portably?
> 
> Might also be worth striking a common misconception here: the Mesa GBM
> implementation is _not_ canonical. gbm.h is the user-facing API you
> have to implement, but beyond that, you don't need to be implemented
> by Mesa's src/gbm/. As the gbm.h types are all opaque, I'm not sure
> what you couldn't express/hide/store - do you have any examples?

Good points.  No, I don't have any examples off hand of things that
couldn't be encapsulated within that.

I agree that the Mesa GBM implementation is not canonical.  Though, it
would be nice to avoid libgbm.so collisions.  Let me know if I should
ask this separately on, e.g., mesa-dev, but would it be reasonable to
treat Mesa's libgbm as the "vendor neutral" library?  It looks like
there are currently two opportunities to load into libgbm:

(a) Load as a "backend" DSO (i.e., get loaded by
    mesa/src/gbm/main/backend.c:_gbm_create_device()).

(b) Load as a DRI driver by the DRI libgbm backend (i.e., get loaded
    by mesa/src/gbm/backends/dri/gbm_dri.c).

For purposes of vendor-specific opaque data, it looks like (a) would
make the most sense.  However, (b) currently conveniently infers a DSO
name to load, by querying the name of the DRM driver that corresponds
to the provided fd.  Maybe it would make sense to hoist some of that
inference logic from (b) to (a)?  It probably also depends on which of
(a) or (b) we'd consider a stabler ABI?

> >> 'EGLStreams is the direction taken in Vulkan' - I would argue not. IMO
> >> the explicit buffer management on the client side does not parallel
> >> EGLStreams, and notably there is no equivalent consumer interface
> >> offered on the server side, but instead the individual-buffer-driven
> >> approach is taken. It's true that VK_WSI_display_swapchain does exist
> >> and does match the EGLStreams model fairly closely, but also that it
> >> does not have universal implementation: the Intel 'anv' Mesa-based
> >> driver does not implement display_swapchain, instead having an
> >> interface to export a VkImage as a dmabuf. It's true that the latter
> >> is not optimal (it lacks the explicit targeting required to determine
> >> the most optimal tiling/compression strategy), but OTOH it is
> >> precedent for explicitly avoiding the
> >> VK_WSI_display_swapchain/EGLStreams model for Vulkan on KMS, just as
> >> GBM avoids it for EGL on KMS.
> >
> > From your perspective, what would be more optimal than VkImage+dmabuf?
> 
> Well, it's pretty much on par with GBM-compositor-Wayland-client and
> an EGLStreams pipeline ending in an EGLOutput. Not having something
> like HWC means that you can't determine the optimal plane-allocation
> strategy.
> 
> On 23 March 2016 at 00:33, Andy Ritger <aritger at nvidia.com> wrote:
> > On Tue, Mar 22, 2016 at 09:52:21PM +0000, Daniel Stone wrote:
> >> Agreed. One of the things I've been incredibly happy with is how our
> >> platform has managed to stay completely generic and vendor-neutral so
> >> far, and I'd love to preserve that.
> >
> > I don't think you'll find any disagreement to that from NVIDIA, either.
> >
> > I apologize if the EGLStreams proposal gave the impression of a
> > vendor-private solution.  That wasn't the intent.  The EGLStream family
> > of extensions are, after all, an open specification that any EGL vendor
> > can implement.  If there are aspects of any of these EGL extensions that
> > seem useful, I'd hope that Mesa would we willing to adopt them.
> 
> Indeed, this wasn't to cast any aspersions on how you guys have
> developed Streams. Having it out there and having these patches has
> really been tremendously useful.

Great.

> > We (NVIDIA) clearly think EGLStreams is a good direction for expressing
> > buffer sharing semantics.  In our ideal world, everyone would implement
> > these extensions and Wayland compositors would migrate to using them as
> > the generic vendor-neutral mechanism for buffer sharing :)
> 
> But here's where my problem lies. At the moment, the 'how do I
> Wayland' story is very straightforward, and not entirely
> coincidentally similar to ChromeOS's: you implement GBM+KMS, you
> implement the ~25 LoC of libwayland-egl, you implement
> EGL_EXT_platform_{gbm,wayland}, and ... that's it. Introducing Streams
> as an alternate model is certainly interesting, and I understand why
> you would do it, but having it as the sole option muddies the 'how do
> I Wayland' story significantly.
> 
> Getting away from the vendor-bound DDX model was something we were
> desperate to do (see also xf86-video-modesetting landing on GBM+EGL),
> and I'd really just like to avoid that becoming 'well, for most
> platforms you do this, but for this platform / these platforms, you do
> this instead ...'.

Yes, fair and understood.

Thanks,
- Andy

> Cheers,
> Daniel