Introduction and updates from NVIDIA

Sat May 14 16:46:51 UTC 2016

Hi James,

On 12 May 2016 at 00:08, James Jones <jajones at nvidia.com> wrote:
> GBM alone can not perform as well as EGLStreams unless it is extended into
> something more or less the same as EGLStreams, where it knows exactly what
> engines are being used to produce the buffer content (along with their
> current configuration), and exactly what engines/configuration are being
> used to consume it.  This implies allocating against multiple specific
> objects, rather than a device and a set of allocation modifier flags, and/or
> importing an external allocation and hoping it meets the current
> requirements.  From what I can see, GBM fundamentally understands at most
> the consumer side of the equation.

I disagree with the last part of this. GBM is integrated with EGL, and
thus has the facility to communicate with the producer as it pleases,
through private protocol.

> Suppose however, GBM was taught everything streams know implicitly about all
> users of the buffers at allocation time.  After allocation, GBM is done with
> its job, but streams & drivers aren't.
>
> The act of transitioning a buffer from optimal "producer mode" to optimal
> "consumer mode" relies on all the device & config information as well,
> meaning it would need to be fed into the graphics driver (EGL or whatever
> window system binding is used) by each window system the graphics driver was
> running on to achieve equivalent capabilities to EGLStream.

Sure. But this leads into one huge (unaddressed) concern I have:
integration with the world outside libEGL.so. Vulkan and media APIs
are going to need to gain explicit knowledge - read, an extra
dependency on - EGL in order to deal with this. Then let's throw a
media device into the mix: how does Streams ensure optimal
configuration? Does that require teaching EGL about media decode
devices, and growing a whole other API for that? More pressingly, how
do you deal with other devices?

Tegra devices are in an enviable position where NVIDIA produces all
the IP, but in that regard it stands alone in the SoC world. The only
two cases I know of where the IP blocks are homogeneous are Tegra and
some Qualcomm devices - but then again, some Qualcomm blocks use a
Samsung media decode IP. Same story for multi-GPU drivers: how do you
do interop between an Intel GPU doing composition and an NVIDIA GPU
producing content?

>From where I stand, there are two options to deal with this: one is to
declare that the world must use EGLStreams for optimal allocation,
even if they'd never previously used Streams, and the other is to
surface the interactions with Streams into a public API that can be
used by, say, media producers. Which model are you looking towards
here?

Again, NVIDIA are fine with producing a very large libEGL.so, and
Tegra's nature makes that easier to do, but what about everyone else?

> Fundamentally, the API-level view of individual graphics buffers as raw
> globally coherent & accessible stores of pixels with static layout is
> flawed.  Images on a GPU are more of a mutating spill space for a collection
> of state describing the side effects of various commands than a 2D array of
> pixels.  Forcing GPUs to resolve an image to a 2D array of pixels in any
> particular layout can be very inefficient.  The GL+GLX/EGL/etc. driver model
> hides this well, but it breaks down in a few cases like EGLImage and
> GLX_EXT_texture_from_pixmap, the former not really living up to its implied
> potential because of this, and the latter mostly working only because it has
> a very limited domain where things can be shared, but still requires a lot
> of platform-specific code to support properly.  Vulkan brings a lot more of
> this out into the open with its very explicit image state transitions and
> limitations on which engines can access an image in any given state, but
> that's just within the Vulkan API itself (I.e., strictly on a single GPU and
> optionally an associated display engine within the same driver & process) so
> far.

There's nothing in this I disagree with, but I also don't read it an
indictment of GBM. You've previously made the point that looking
beyond frames to streams is a better way of looking at things, which
is fine, but both Wayland and KMS are fundamentally frame-based at
their core, so the impedance mismatch is already pretty obvious from
the start.

> The EGLStream encapsulation takes into consideration the new use cases
> EGLImage, GBM, etc. were intended to address, and restores what I believe to
> be the minimal amount of the traditional GL+GLX/EGL/etc. model, while still
> allowing as much of the flexibility of the "a bunch of buffers" mental model
> as possible.  We can re-invent that with GBM API adjustments, a set of
> restrictions on how the buffers it allocates can be used, and another layer
> of metadata being pumped into drivers on top of that, but I suspect we'd
> wind up with something that looks very similar to streams.

The only allocation GBM does is for buffers produced by the compositor
and used for scanout, so in this regard it's quite straightforward.
Client buffers are a separate topic, and I don't buy that the
non-Streams model precludes things like render compression. In fact,
Ben Widawsky, Dan Vetter, and some others are as we speak working on
support for render compression within both Wayland EGL and GBM itself
(for direct scanout from compressed buffers with an auxiliary plane).
So far, the only external impact has been a very small extension to
the GBM API to allow use of multiple planes and FB modifiers: a far
smaller change than implementing the whole of Streams and all its
future extensions (Switch et al).

> We're both delving into future developments and hypotheticals to some degree
> here.  If we can't agree now on which direction is best, I believe the right
> solution is to allow the two to co-exist and compete collegially until the
> benefits of one or the other become more apparent.  The Wayland protocol and
> Weston compositor were designed in a manner that makes this as painless as
> possible.  It's not like we're going to get a ton of Wayland clients that
> suddenly rely on EGLStream.  At worst, streams lose out and some dead code
> needs to be deleted from any compositors that adopted them.  As we
> discussed, there is some maintenance cost to having two paths, but I believe
> it is reasonably contained.

It would be interesting to see the full Streams patchset - including
EGLSwitch and direct-scanout - to see what the final impact would be
like.

As Kristian says, I really don't see where the existing non-Streams
solutions, being GBM on the compositor side and private frame-based
protocols between compositor and client, leave you unable to reach
full performance potential. Do you have any concrete usecases that you
can point to in as much detail as possible, outlining exactly how the
GBM/private-Wayland-protocol model forces you to compromise
performance?

Cheers,
Daniel