Introduction and updates from NVIDIA

Wed Mar 23 00:12:52 UTC 2016

Thanks for the thorough responses, Daniel.

On Tue, Mar 22, 2016 at 01:49:59PM +0000, Daniel Stone wrote:
> Hi Miguel,
> 
> On 21 March 2016 at 16:28, Miguel Angel Vico <mvicomoya at nvidia.com> wrote:
> > First of all, I'd like to introduce myself to the Wayland community: My
> > name is Miguel A. Vico, and I've been working as a Software Engineer
> > for NVIDIA for some time now, more specifically, in the Linux drivers
> > team. Although I've never spoken before, I've been lately following the
> > amazing work that you all have been doing here.
> 
> Welcome!
> 
> I'm sorry I don't have some better news for you, but Andy and Aaron
> can tell you it's not personal: this has been going on for years.

Yes, we expected this would be somewhat controversial.  I appreciate you
looking at the patch series seriously, Daniel, and especially trying to
drill into the crux of the gbm concerns.

> > In order to make the Weston DRM compositor work with our drivers, we
> > have used EGLDevice, EGLOutput, and EGLStream objects.
> 
> This is ... unfortunate. To echo what Daniel Vetter said, on the whole
> these modesetting-in-EGL extensions are not something which have that
> wide support, or even implementation. That being said, it's
> interesting to have an implementation, because it has helped shape my
> feelings and arguments a little, into something more concrete
> 
> > For those not familiar with this set of EGL structures, here I try to
> > summarize the most important part of them, and how would they fit in
> > the current Weston DRM compositor design:
> >
> >     EGLDevice provides means to enumerate native devices, and then
> >     create an EGL display connection from them.
> 
> This is generically useful: we would like to extend
> eglGetPlatformDisplay to take an attrib naming an EGLDevice, which we
> could then use with platform_gbm (to select GPU and scanout device
> separately, either for multi-GPU systems or also for SoCs with
> discrete GPU/dispc setups) as well as platform_wayland and co.
> 
> >     Similarly, EGLOutput will provide means to access different
> >     portions of display control hardware associated with an EGLDevice.
> >
> >     For instance, EGLOutputLayer represents a portion of display
> >     control hardware that accepts an image as input and processes it
> >     for presentation on a display device.
> 
> I still struggle to see the value of what is essentially an
> abstraction over KMS, but oh well.

The intent wasn't to abstract all of KMS, just the surface presentation
aspect where EGL and KMS intersect.  Besides the other points below,
an additional motivation for abstraction is to allow EGL to work with
the native modesetting APIs on other platforms (e.g., OpenWF on QNX).

> >     EGLStream implements a mechanism to communicate frame producers and
> >     frame consumers. By attaching an EGLOutputLayer consumer to a
> >     stream, a producer will be able to present frames on a display
> >     device.
> 
> This is understating things quite a bit, I think. On the
> Wayland-client side, it's a pretty big change from the EGLSurface
> model, particularly if you use the default mailbox mode (see comments
> on patch 4/7 as to how this breaks real-world setups, AFAICT).

This shouldn't have a change on the client side: libnvidia-wayland-egl.so
abstracts away the DRM buffer versus EGLStream differences.  These patches
have no effect on the behavior of clients, barring bugs.

> On the Wayland-compositor side, it's two _huge_ changes.
> 
> Firstly, again looking at the case where a Wayland client is a stream
> producer and the Wayland compositor is a consumer, we move from a
> model where references to individual buffers are explicitly passed
> through the Wayland protocol, to where those buffers merely carry a
> reference to a stream. Again, as stated in the review of 4/7, that
> looks like it has the potential to break some actual real-world cases,
> and I have no idea how to solve it, other than banning mailbox mode,
> which would seem to mostly defeat the point of Streams (more on that
> below).

Streams are just a transport for frames.  The client still explicitly
communicates when a frame is delivered through the stream via wayland
protocol, and the compositor controls when it grabs a new frame, via
eglStreamConsumerAcquireKHR().  Unless there are bugs in the patches,
the flow of buffers is still explicit and fully under the wayland protocol
and compositor's control.

Also, mailbox mode versus FIFO mode should essentially equate to Vsync
off versus Vsync on, respectively.  It shouldn't have anything to do
with the benefits of streams, but mailbox mode is a nice feature for
benchmarking games/simulations or naively displaying your latest &
greatest content without tearing.

> Secondly, looking at the compositor-drm case, the use of the dumb
> buffer to display undefined content as a dummy modeset really makes me
> uneasy,

Yes, the use of dumb buffer in this patch series is a kludge.  If we
were going to use drmModeSetCrtc + EGLStreams, I think we'd want to
pass no fb to drmModeSetCrtc, but that currently gets rejected by DRM.
Are surface-less modesets intended to be allowable in DRM?  I can hunt
that down if that is intended to work.  Of course, better to work out
how EGLStreams should cooperate with atomic KMS.

It was definitely an oversight to not zero initialize the dumb buffer.

> again because both gl-renderer and compositor-drm are written
> for explicit individual buffer management, rather than streams in +
> streams out. I think the combination of the two pushes them long
> beyond the point of readability, and I'd encourage you to look at
> trying to split those files up, or at least the functions within them.
> Attempting to keep both modes in there just looks like a maintenance
> nightmare, especially when this streams implementation
> (unsurprisingly) has to bypass almost the entire runtime (as opposed
> to init-time) functionality of compositor-drm.
> 
> Also, I'm not quite sure how you're testing the compositor-as-consumer
> mode: I can't seem to see any EGL extensions which allow you to
> connect a Wayland surface as an EGLStream consumer. Do you have
> something else unpublished that's being used here, or is this what the
> libnvidia-egl-wayland library is for? Or do you just have clients
> using EGLSurfaces as normal, which happen to be implemented internally
> as EGLStreams? (Also, that the only way to test this is through
> proprietary drivers implementing only-just-published extensions not
> only makes me very sad, but hugely increases the potential for this to
> be inadvertently broken.)

Sorry if this seemed cryptic.  You are correct that EGL Wayland clients
just use EGLSurfaces as normal (no Wayland client changes), and that
gets implemented using EGLStreams within libnvidia-egl-wayland.

FWIW, we plan to release the source to libnvidia-egl-wayland
eventually... it has a few driver-specific warts right now, but the
intent is that it is a vendor-independent implementation (though, using
EGLStreams, so...) of EGL_KHR_platform_wayland using a set of EGL API
"wrappers".  The goal was to allow window systems to write these EGL
platform binding themselves, so that each EGL implementation doesn't
have to implement each EGL_KHR_platform_*.  Anyway, we'll try to get
libnvidia-egl-wayland cleaned up and released.

> >     Thus, a compositor could produce frames and feed them to an
> >     EGLOutputLayer through an EGLStream for presentation on a display
> >     device.
> >
> >     In a similar way, by attaching a GLTexture consumer to a stream, a
> >     producer (wayland client) could feed frames to a texture, which in
> >     turn can be used by a compositor to prepare the final frame to be
> >     presented.
> 
> Quick aside: this reminds me in many unfortunate ways of
> GLX_EXT_texture_from_pixmap. tfp gave us the same 'capture stream of
> stuff and make it appear in a texture' model as streams, whereas most
> of the rest of the world (EGL, Vulkan WSI, Wayland, Android, ChromeOS,
> etc) have all moved explicitly _away_ from that model to passing
> references to individual buffers, this in many ways brings us back to
> tfp.

Is that really an accurate comparison?  The texture_from_pixmap extension
let X11 composite managers bind a single X pixmap to an OpenGL texture.
It seems to me what was missing in TFP usage was explicit synchronization
between X and/or OpenGL rendering into the pixmap and OpenGL texturing
from the pixmap.

> >     Whenever EGL_EXT_device_drm extension is present, EGLDevice can
> >     be used to enumerate and access DRM KMS devices, and EGLOutputLayer
> >     to enumerate and access DRM KMS crtcs and planes.
> 
> Again, the enumeration isn't so much used as bypassed. The original
> enumeration is used, and all we do with the EGL objects is a) list all
> of them, b) filter them to find the one we already have, and c)
> perhaps replace their internal representation of the device with the
> one we already have.

That's fair in the context of this patch set.

In general, EGLDevice provides device enumeration for other use cases
where it is the basis for bootstrapping.  Maybe we could better reconcile
udev and EGLDevice in the patch set, but some of this is a natural, though
unfortunate, artifact of correlating objects between two enumeration APIs.

> >     By using EGLStreams and attaching an EGLOutputLayer consumer
> >     (representing a DRM KMS crtc or plane) to it, compositor-drm can
> >     produce final composition frames and present them on a DRM device.
> 
> Arguably it's gl-renderer producing the frames, with compositor-drm
> kind of acting as a fake consumer (EGL_NV_stream_attrib).
> 
> > Additionally, in order to allow wl_buffers to be bound to EGLStreams, we
> > kludged eglQueryWaylandBufferWL(EGL_WAYLAND_BUFFER_WL) to return the
> > stream file descriptor.
> 
> As said earlier, I don't think this is the right way to go, and have
> other suggestions.
> 
> I'd like to look at the elephant in the room, which is why you're
> using this in the first place (aside from general NVIDIA enthusiasm
> for encapsulating everything within EGL Streams/Output/Device/etc,
> dating back many years). Andy/Aaron, you've said that you found GBM to
> be inadequate, and I'd like to find out explicitly how.

Thanks.  This is the real heart of the debate.

> Through a few
> snippets of IRC and NVIDIA devtalk, so far I can see:
> 
> 'We can't choose an optimal rendering configuration, because we don't
> know how it's going to be used' - (almost completely) untrue. The FD
> you pass to gbm_device_create is that of the KMS device, a gbm_surface
> contains information as to how the plane (primary or overlay) will be
> configured,

Maybe I'm not looking in the right place, but where does gbm_surface get
the intended plane configuration?  Are there other display-related flags
beside GBM_BO_USE_SCANOUT?  Then again, the particular plane doesn't
impact us for current GPUs.

> and an EGLDisplay lets you tie the rendering and scanout
> devices together. What more information do you need? It's true that we
> don't have a way to select individual rendering devices at the moment,
> but as said earlier, passing an EGLDevice as an attrib to
> GetPlatformDisplay would resolve that, as you would have the render
> device identified by the EGLDevice and the scanout device identified
> by the gbm_device. At that point, you have the full pipeline and can
> determine the optimal configuration.

Beyond choosing optimal rendering configuration, there is arbitration of
the scarce resources needed for optimal rendering configuration.  E.g.,
for Wayland compositor flipping to client-produced buffers, presumably the
client's buffer needs to be allocated with GBM_BO_USE_SCANOUT.  NVIDIA's
display hardware requires physically contiguous buffers, so we wouldn't
want clients to _always_ allocate buffers with the GBM_BO_USE_SCANOUT
flag.  It would be nice to have feedback between the EGL driver instance
in the compositor and the EGL driver running in the client, to know how
the buffer is going to be used by the Wayland compositor.

I imagine other hardware has even more severe constraints on displayable
memory, though, so maybe I'm misunderstanding something about how buffers
are shared between wayland clients and compositors?

This ties into the next point...

> 'We don't know when to schedule decompression, because there's no
> explicit barrier' - completely untrue. eglSwapBuffers is that barrier.
> For example, in Freescale i.MX6, the Vivante GPU and Freescale IPU
> (display controller) do not share a single common format between GPU
> render targets and IPU scanout sources, so require a mandatory
> detiling pass in between render and display. These work just fine with
> gbm with that pass scheduled by eglSwapBuffers. This to me seems
> completely explicit, unless there was something else you were meaning
> ... ?

The Vivante+Freescale example is a good one, but it would be more
interesting if they shared /some/ formats and you could only use those
common formats in /some/ cases.

I think a lot of the concern is about passing client-produced frames
all the way through to scanout (i.e., zero-copy).  E.g., if the wayland
client is producing frames that the wayland compositor is going to use
as a texture, then we don't want the client to decompress as part of its
eglSwapBuffers: the wayland compositor will texture from the compressed
frame for best performance.  But, if the wayland compositor is going to
flip to the surface, then we would want the client to decompress during
its eglSwapBuffers.

The nice thing about EGLStreams here is that if the consumer (the Wayland
compositor) wants to use the content in a different way, the producer
must be notified first, in order to produce something suitable for the
new consumer.

> 'Width, height, pitch and format aren't enough information' - this is
> true, but not necessarily relevant. I'm not sure what the source of
> this actually is: is it the gbm_bo_get_*() APIs? If so, yes, they need
> to be extended with a gbm_bo_get_modifier() call, which would allow
> you to get the DRM format modifier to describe tiling/compression/et
> al (as well as perhaps being extended to allow you to extract multiple
> buffers/planes, e.g. to attach auxiliary compression buffers). If it's
> not gbm, what actually is it? The only other place I can think of
> (suggested by Pekka, I think) was the wl_drm protocol, which it should
> be stressed is a) not required in any way by Wayland, b) not a
> published/public protocol, c) not a stable protocol. wl_drm just
> happens to be the way that Mesa shares buffers, just as wl_viv is how
> Vivante's proprietary driver shares buffers, and mali_buffer_sharing
> is how the Mali driver does it. Since the server side is bound by
> eglBindWaylandDisplayWL and the client side is also only used through
> EGL, there is _no_ requirement for you to also implement wl_drm. As it
> is a hidden private Mesa protocol, there is also no requirement for
> the protocol to remain stable.

I agree that wl_drm doesn't factor into it.

Maybe some of this is my confusion over what parts of gbm.h are
application-facing, and what parts are driver-facing?  We, and
presumably most hardware vendors, would want the ability to associate
arbitrary metadata with gbm_bo's, but most of that metadata is
implementation-specific, and not really something an application should
be looking at without sacrificing portability.

> 'EGLStreams is the direction taken in Vulkan' - I would argue not. IMO
> the explicit buffer management on the client side does not parallel
> EGLStreams, and notably there is no equivalent consumer interface
> offered on the server side, but instead the individual-buffer-driven
> approach is taken. It's true that VK_WSI_display_swapchain does exist
> and does match the EGLStreams model fairly closely, but also that it
> does not have universal implementation: the Intel 'anv' Mesa-based
> driver does not implement display_swapchain, instead having an
> interface to export a VkImage as a dmabuf. It's true that the latter
> is not optimal (it lacks the explicit targeting required to determine
> the most optimal tiling/compression strategy), but OTOH it is
> precedent for explicitly avoiding the
> VK_WSI_display_swapchain/EGLStreams model for Vulkan on KMS, just as
> GBM avoids it for EGL on KMS.

>From your perspective, what would be more optimal than VkImage+dmabuf?

> I think it's been good to have this series to push the discussion
> further in more concrete terms, but unfortunately I have to say that
> I'm even less convinced now than I have ever been. Sorry.

Thanks for the feedback so far.
- Andy

> Cheers,
> Daniel