Introduction and updates from NVIDIA

Tue May 3 18:44:51 UTC 2016

On 05/03/2016 09:53 AM, Daniel Stone wrote:
> Hi James,
>
> On 3 May 2016 at 17:07, James Jones <jajones at nvidia.com> wrote:
>> On 04/29/2016 03:07 PM, Daniel Stone wrote:
>>>> With new Wayland protocol, patches to all Wayland compositors to send
>>>> proper
>>>> hints to clients using this protocol, improvements to GBM, and updates to
>>>> both of these when new GPU architectures introduced new requirements,
>>>> what
>>>> you describe could do anything streams can do. However, then the problem
>>>> will have been solved only in the context of top-of-tree Wayland and
>>>> Weston.
>>>
>>> This doesn't require explicit/new compositor interaction at all.
>>> Extensions can be done within the gbm/EGL bundle itself (via
>>> EGL_WL_bind_wayland_display), so you're only changing one DSO (or DSO
>>> bundle), and the API usage there today does seem to stand up. Given
>>> that the protocol is private - I'm certainly not advocating for a
>>> DRI2-style all-things-to-all-hardware standard protocol to communicate
>>> this - and that it's localised in a vendor bundle, it seems completely
>>> widely applicable to me. As someone who's writing this from
>>> Mutter/Wayland/GBM, I'm certainly not interested in Weston-only
>>> solutions.
>>
>> No, the necessary extensions can not be contained within the binding. There
>> is not enough information within the driver layer alone. Something needs to
>> tell the driver when the configuration changes (E.g., the consumer of a
>> wayland surface switches from a texture to a plane) and what the new
>> configuration is. This would trigger the protocol notifications &
>> subsequent optimization within the driver.  By the nature of their API,
>> streams would require the compositor to take action on such configuration
>> changes, and streams can discover the new configuration.  Something
>> equivalent would be required to make this work in the GBM+wl_drm/EGL case.
>
> I don't think this is the case. As I went through with Andy, we
> _already_ have intent expressed in the GBM case, in the exact same way
> that EGLStreams does: consider gbm_bo_import as equivalent for
> attaching to an EGLOutput(Layer) consumer, and EGLImage import +
> TargetTexture2D as equivalent for attaching a gltexture consumer.

"Will be used for display on device X" is not sufficient information, as 
Daniel Vetter outlined.

> This
> is the exact same proxy for intent to display, and in fact the GBM
> approach is slightly more flexible, because it allows you to both do
> direct scanout as well as GPU composition (e.g. if you're
> capturing/streaming at the same time as display).
>
> Again though, without stream-retargeting, this is not something which
> exists in Streams today, and doing so is going to require more
> extensions: more code in your driver, more code in every
> implementation. GBM today, for all its faults, does not require
> further API extension to make this work.

Agreed.  We're working on similar flexibility for streams via an 
EGLSwitch muxing extension.  As mentioned above, GBM would require API 
extensions and driver updates to reach the expressiveness of streams as 
well though.

>> Further, as a driver vendor, the idea of requiring even in-driver
>> platform-specific modifications for this sounds undesirable.  If it was
>> something that could be contained entirely within GBM, that would be
>> interesting.  However, distributing the architecture-specific code
>> throughout the window-system specific code in the driver means a lot more
>> maintenance burden in a world with X, Chrome OS, Wayland, and several
>> others.
>
> This would hold true if Streams was a perfect encapsulation, but I
> don't really see how doing so adds any burden over layering the
> winsys/platform layer over Streams in the first place. I mean, you've
> written Wayland bindings for Streams in the first place ... how would
> this be too much different? Even if the protocol is designed to be the
> perfect transport for Streams, you _still_ need transport bindings to
> your target protocol.

We wrote the wayland protocol as an example of what is possible using 
streams, and we intend to open-source it.  Presumably window-system 
authors would write the protocol for other windowing systems.  Further, 
since streams would encapsulate all the device-specific stuff, the 
protocol library wouldn't require as much maintenance as a 
driver-specific protocol library.

In a world with only Wayland, yes, we'd be doing slightly more work to 
bootstrap streams support than we would to support GBM+wayland. 
However, other windowing systems and stream use cases exist.

What streams exposes is intended to lower the amount of stuff hidden in 
drivers, not increase it.  Streams is a generic swapchain mechanism 
exposed to any user, whereas we would need to write something 
proprietary (maybe open source, maybe closed source, but NVIDIA-specific 
none the less) for each window system to get equivalent performance if 
we pushed the abstraction to a lower level.

>>> Certainly there are, but then again, there are far more usecases than
>>> EGL. Looking at media playback, Vulkan, etc, where you don't have EGL
>>> yet need to solve the same problems.
>>
>>
>> EGLStreams, Vulkan swapchains, and (for example) VDPAU presentation queues
>> are all varying levels of abstraction on top of the same thing within the
>> driver: a presentation engine or buffer queue, depending on whether the
>> target is a physical output or a compositor.  These API-level components can
>> be hooked up to eachother as long as the lower-level details are fully
>> contained within the driver abstraction. A Vulkan swapchain can be
>> internally implemented as an EGLStream producer, for example.  In fact,
>> Vulkan swapchains borrow many ideas directly and indirectly from EGLStream.
>
> Indeed, I noted the similarity, but primarily for the device_swapchain
> extension.
>
>>> I agree, and I'm not arguing this to be on the application or
>>> compositor side either. I believe the GBM and HWC suggestions are
>>> entirely doable, and further that these problems will need to be
>>> solved outside EGL anyway, for the other usecases. My worry - quite
>>> aside from how vendors who struggle to produce a conformant EGL 1.4
>>> implementation today will ever implement the complexity of Streams,
>>> though this isn't your problem - is that EGL is really the wrong place
>>> to be solving this.
>>
>> Could you elaborate on what the other usecases are?  If you mean the
>> Vulkan/media playback cases mentioned above, then I don't see what is
>> fundamentally wrong about using EGL as a backend within the window system
>> for those.  If a Vulkan application needs to display on an EGL+GLES-based
>> Wayland compositor, there will be some point where a transition is made from
>> Vulkan -> EGL+GLES regardless.
>
> Media falls down because currently there is no zerocopy binding from
> either hardware or software media decode engines. Perhaps not the case
> on your hardware, unusually blessed with a great deal of memory
> bandwidth, but a great many devices physically cannot cope with a
> single copy in the pipeline, given the ratio of content size to memory
> bandwidth. Doing this in EGL would require a 'draw' step which simply
> presented an existing buffer - a step which would unnecessarily
> involve the GPU if the pipeline is direct from decode to scanout - or
> it would involve having every media engine write their own bindings to
> the Streams protocol.

Right.  Streams are meant to support lot's of different producers and 
consumers.

> There are also incredibly exacting timing requirements for media
> display, which the Streams model of 'single permanently fixed latency'
> does not even come close to achieving. So for that you'd need another
> extension, to report actual achieved timings back. Wayland today
> fulfills these requirements with the zlinux_dmabuf and
> presentation_timing protocols, with the original hardware timings fed
> back through KMS.

Would it be reasonable to support such existing extensions while using 
streams?

>>> I think it's large enough that it warrants a split of gl-renderer and
>>> compositor-drm, rather than trying to shoehorn them into the same
>>> file. There's going to be quite some complexity hiding between the
>>> synchronise-with-client-event-stream and direct-scanout boxes, that
>>> will push it over the limit of what's tractable. Those files are
>>> already pretty huge and complex.
>>
>> Would it be better to wait until such complexities arise in future patches
>> and split the files at that point, or would you prefer we split the backends
>> now?  Perhaps I'm just more optimistic about the complexity, but it seems
>> like it would be easier to evaluate once that currently-hypothetical portion
>> of the code exists.
>
> Well, there were quite a few issues with the previous set of patches,
> and honestly I'm expecting just resolving those to bring enough
> complexity to require a three-way split (common, Streams, and
> EGLImage/GBM), let alone the features you're talking about solving
> with Streams: direct scanout via retargeting of Streams, etc.
>
>>> I share the hope, and maybe with the WSI and Streams available, we can
>>> design future window systems and display control APIs towards
>>> something like that. But at the moment, the impedance mismatch between
>>> Streams and the (deliberately very different) Wayland and KMS APIs is
>>> already fairly glaring. The winsys support is absolutely trivial to
>>> write, and with winsys interactions only getting more featureful and
>>> complex, such will the common stream protocol have to be.
>>>
>>> If I was starting from the position of the EGL ideal: that everything
>>> is EGL, and the only external interactions are creating native types
>>> for it, then I would surely arrive at the same position as you. But
>>> everything we've seen so far - and again, ChromeOS have taken this to
>>> a much further extent - has been chipping away at EGL, rather than
>>> putting more into it, and this has been for the better.
>>
>> The direction ChromeOS is taking is even more problematic, and I'd hate to
>> see it being held up as an example of proper design direction.  We spent a
>> good deal of time working with Google to support ChromeOS and ended up
>> essentially allowing them to punch through the driver abstraction via very
>> opaque EGL extensions that no engineer besides the extension authors could
>> be expected to use correctly, and embed HW-specific knowledge within some
>> component of ChromeOS, such that it will likely only run optimally on a
>> single generation of our hardware and will need to be revisited.  That's the
>> type of problem we're trying to avoid here.  ChromeOS has made other design
>> compromises that cost us (and I suspect other vendors) 10-20% performance
>> across the board to optimize for a very specific use case (I.e., a browser)
>> and within very constrained schedules.  It is not the right direction for
>> OS<->graphics driver interactions to evolve.
>
> Direction and extent are two very different things: I largely agree
> with their direction (less encapsulation inside vendor drivers), and
> disagree on the extent to which they've taken it.

That's a very good point.  I agree minimal encapsulation is a good goal.

>>> I don't think that's a difference we'll ever resolve though.
>>
>> I believe thus far we've all tried to focus objectively on specific issues,
>> proposed solutions for them, and the merits of those solutions.  Weston and
>> the other Wayland compositors I'm aware of are based on EGL at the moment,
>> so regardless of its merits as an API it doesn't seem problematic purely
>> from a dependency standpoint to add EGLStream as an option next to the
>> existing EGLImage and EGLDisplay+GBM paths.  I'm certainly willing to
>> continue discussing the merits of EGL on a broader scale, but does that
>> discussion need to block the patches proposed here?
>
> Every additional codepath has its cost. Even if you just look at
> Mutter and Weston in a vacuum, it seems like it'll be quite the large
> patchset(s) by the time it's done, let alone extending it out to all
> the other compositors. This is a patchset which will need constant
> care and feeding: if it's not tested, it's broken. Right now, there is
> only one Streams implementation available, which is in a driver whose
> legal status is seen to be sufficiently problematic that it is not
> generally distributed by downstreams, which requires a whole set of
> external kernel patches to run. So even getting it to run is
> non-trivial.
>
> But then we'd have to do that in such a way that it was generally
> available, else any refactoring or changes we wanted to do internally
> would have to be blocked on testing/review from someone who knew that
> backend well enough. Either that, or it would just get broken.
> Introducing these codepaths has a very, very, real cost to the
> projects you're talking about.

If there were an open source implementation of streams, would that 
affect your view?

Agreed, all new code, and especially new significant branches in code 
has costs.  However, a balance always needs to be struck.

> You could quite rightly point to the Raspberry Pi DispManX backend as
> an example of the same, and you'd be right. And that's why I'm
> extremely enthused about how their new KMS/GBM driver allows us to
> nuke the entire backend from orbit, and reduce our testing load by
> shifting them to the generic driver.

I hope we can avoid an entirely forked compositor-drm/eglstream (and 
especially gl-renderer) for these reasons.  The majority of the code is 
still common and would be exercised using either path.

Thanks,
-James

> Cheers,
> Daniel
>