Introduction and updates from NVIDIA

Tue May 3 16:53:03 UTC 2016

Hi James,

On 3 May 2016 at 17:07, James Jones <jajones at nvidia.com> wrote:
> On 04/29/2016 03:07 PM, Daniel Stone wrote:
>>> With new Wayland protocol, patches to all Wayland compositors to send
>>> proper
>>> hints to clients using this protocol, improvements to GBM, and updates to
>>> both of these when new GPU architectures introduced new requirements,
>>> what
>>> you describe could do anything streams can do. However, then the problem
>>> will have been solved only in the context of top-of-tree Wayland and
>>> Weston.
>>
>> This doesn't require explicit/new compositor interaction at all.
>> Extensions can be done within the gbm/EGL bundle itself (via
>> EGL_WL_bind_wayland_display), so you're only changing one DSO (or DSO
>> bundle), and the API usage there today does seem to stand up. Given
>> that the protocol is private - I'm certainly not advocating for a
>> DRI2-style all-things-to-all-hardware standard protocol to communicate
>> this - and that it's localised in a vendor bundle, it seems completely
>> widely applicable to me. As someone who's writing this from
>> Mutter/Wayland/GBM, I'm certainly not interested in Weston-only
>> solutions.
>
> No, the necessary extensions can not be contained within the binding. There
> is not enough information within the driver layer alone. Something needs to
> tell the driver when the configuration changes (E.g., the consumer of a
> wayland surface switches from a texture to a plane) and what the new
> configuration is. This would trigger the protocol notifications &
> subsequent optimization within the driver.  By the nature of their API,
> streams would require the compositor to take action on such configuration
> changes, and streams can discover the new configuration.  Something
> equivalent would be required to make this work in the GBM+wl_drm/EGL case.

I don't think this is the case. As I went through with Andy, we
_already_ have intent expressed in the GBM case, in the exact same way
that EGLStreams does: consider gbm_bo_import as equivalent for
attaching to an EGLOutput(Layer) consumer, and EGLImage import +
TargetTexture2D as equivalent for attaching a gltexture consumer. This
is the exact same proxy for intent to display, and in fact the GBM
approach is slightly more flexible, because it allows you to both do
direct scanout as well as GPU composition (e.g. if you're
capturing/streaming at the same time as display).

Again though, without stream-retargeting, this is not something which
exists in Streams today, and doing so is going to require more
extensions: more code in your driver, more code in every
implementation. GBM today, for all its faults, does not require
further API extension to make this work.

> Further, as a driver vendor, the idea of requiring even in-driver
> platform-specific modifications for this sounds undesirable.  If it was
> something that could be contained entirely within GBM, that would be
> interesting.  However, distributing the architecture-specific code
> throughout the window-system specific code in the driver means a lot more
> maintenance burden in a world with X, Chrome OS, Wayland, and several
> others.

This would hold true if Streams was a perfect encapsulation, but I
don't really see how doing so adds any burden over layering the
winsys/platform layer over Streams in the first place. I mean, you've
written Wayland bindings for Streams in the first place ... how would
this be too much different? Even if the protocol is designed to be the
perfect transport for Streams, you _still_ need transport bindings to
your target protocol.

>> Certainly there are, but then again, there are far more usecases than
>> EGL. Looking at media playback, Vulkan, etc, where you don't have EGL
>> yet need to solve the same problems.
>
>
> EGLStreams, Vulkan swapchains, and (for example) VDPAU presentation queues
> are all varying levels of abstraction on top of the same thing within the
> driver: a presentation engine or buffer queue, depending on whether the
> target is a physical output or a compositor.  These API-level components can
> be hooked up to eachother as long as the lower-level details are fully
> contained within the driver abstraction. A Vulkan swapchain can be
> internally implemented as an EGLStream producer, for example.  In fact,
> Vulkan swapchains borrow many ideas directly and indirectly from EGLStream.

Indeed, I noted the similarity, but primarily for the device_swapchain
extension.

>> I agree, and I'm not arguing this to be on the application or
>> compositor side either. I believe the GBM and HWC suggestions are
>> entirely doable, and further that these problems will need to be
>> solved outside EGL anyway, for the other usecases. My worry - quite
>> aside from how vendors who struggle to produce a conformant EGL 1.4
>> implementation today will ever implement the complexity of Streams,
>> though this isn't your problem - is that EGL is really the wrong place
>> to be solving this.
>
> Could you elaborate on what the other usecases are?  If you mean the
> Vulkan/media playback cases mentioned above, then I don't see what is
> fundamentally wrong about using EGL as a backend within the window system
> for those.  If a Vulkan application needs to display on an EGL+GLES-based
> Wayland compositor, there will be some point where a transition is made from
> Vulkan -> EGL+GLES regardless.

Media falls down because currently there is no zerocopy binding from
either hardware or software media decode engines. Perhaps not the case
on your hardware, unusually blessed with a great deal of memory
bandwidth, but a great many devices physically cannot cope with a
single copy in the pipeline, given the ratio of content size to memory
bandwidth. Doing this in EGL would require a 'draw' step which simply
presented an existing buffer - a step which would unnecessarily
involve the GPU if the pipeline is direct from decode to scanout - or
it would involve having every media engine write their own bindings to
the Streams protocol.

There are also incredibly exacting timing requirements for media
display, which the Streams model of 'single permanently fixed latency'
does not even come close to achieving. So for that you'd need another
extension, to report actual achieved timings back. Wayland today
fulfills these requirements with the zlinux_dmabuf and
presentation_timing protocols, with the original hardware timings fed
back through KMS.

>> I think it's large enough that it warrants a split of gl-renderer and
>> compositor-drm, rather than trying to shoehorn them into the same
>> file. There's going to be quite some complexity hiding between the
>> synchronise-with-client-event-stream and direct-scanout boxes, that
>> will push it over the limit of what's tractable. Those files are
>> already pretty huge and complex.
>
> Would it be better to wait until such complexities arise in future patches
> and split the files at that point, or would you prefer we split the backends
> now?  Perhaps I'm just more optimistic about the complexity, but it seems
> like it would be easier to evaluate once that currently-hypothetical portion
> of the code exists.

Well, there were quite a few issues with the previous set of patches,
and honestly I'm expecting just resolving those to bring enough
complexity to require a three-way split (common, Streams, and
EGLImage/GBM), let alone the features you're talking about solving
with Streams: direct scanout via retargeting of Streams, etc.

>> I share the hope, and maybe with the WSI and Streams available, we can
>> design future window systems and display control APIs towards
>> something like that. But at the moment, the impedance mismatch between
>> Streams and the (deliberately very different) Wayland and KMS APIs is
>> already fairly glaring. The winsys support is absolutely trivial to
>> write, and with winsys interactions only getting more featureful and
>> complex, such will the common stream protocol have to be.
>>
>> If I was starting from the position of the EGL ideal: that everything
>> is EGL, and the only external interactions are creating native types
>> for it, then I would surely arrive at the same position as you. But
>> everything we've seen so far - and again, ChromeOS have taken this to
>> a much further extent - has been chipping away at EGL, rather than
>> putting more into it, and this has been for the better.
>
> The direction ChromeOS is taking is even more problematic, and I'd hate to
> see it being held up as an example of proper design direction.  We spent a
> good deal of time working with Google to support ChromeOS and ended up
> essentially allowing them to punch through the driver abstraction via very
> opaque EGL extensions that no engineer besides the extension authors could
> be expected to use correctly, and embed HW-specific knowledge within some
> component of ChromeOS, such that it will likely only run optimally on a
> single generation of our hardware and will need to be revisited.  That's the
> type of problem we're trying to avoid here.  ChromeOS has made other design
> compromises that cost us (and I suspect other vendors) 10-20% performance
> across the board to optimize for a very specific use case (I.e., a browser)
> and within very constrained schedules.  It is not the right direction for
> OS<->graphics driver interactions to evolve.

Direction and extent are two very different things: I largely agree
with their direction (less encapsulation inside vendor drivers), and
disagree on the extent to which they've taken it.

>> I don't think that's a difference we'll ever resolve though.
>
> I believe thus far we've all tried to focus objectively on specific issues,
> proposed solutions for them, and the merits of those solutions.  Weston and
> the other Wayland compositors I'm aware of are based on EGL at the moment,
> so regardless of its merits as an API it doesn't seem problematic purely
> from a dependency standpoint to add EGLStream as an option next to the
> existing EGLImage and EGLDisplay+GBM paths.  I'm certainly willing to
> continue discussing the merits of EGL on a broader scale, but does that
> discussion need to block the patches proposed here?

Every additional codepath has its cost. Even if you just look at
Mutter and Weston in a vacuum, it seems like it'll be quite the large
patchset(s) by the time it's done, let alone extending it out to all
the other compositors. This is a patchset which will need constant
care and feeding: if it's not tested, it's broken. Right now, there is
only one Streams implementation available, which is in a driver whose
legal status is seen to be sufficiently problematic that it is not
generally distributed by downstreams, which requires a whole set of
external kernel patches to run. So even getting it to run is
non-trivial.

But then we'd have to do that in such a way that it was generally
available, else any refactoring or changes we wanted to do internally
would have to be blocked on testing/review from someone who knew that
backend well enough. Either that, or it would just get broken.
Introducing these codepaths has a very, very, real cost to the
projects you're talking about.

You could quite rightly point to the Raspberry Pi DispManX backend as
an example of the same, and you'd be right. And that's why I'm
extremely enthused about how their new KMS/GBM driver allows us to
nuke the entire backend from orbit, and reduce our testing load by
shifting them to the generic driver.

Cheers,
Daniel