Introduction and updates from NVIDIA

Wed May 11 20:43:45 UTC 2016

On 05/04/2016 08:56 AM, Daniel Stone wrote:
> Hi,
> Interleaving both replies ...
>
> On 3 May 2016 at 19:44, James Jones <jajones at nvidia.com> wrote:
>> On 05/03/2016 09:53 AM, Daniel Stone wrote:
>>> On 3 May 2016 at 17:07, James Jones <jajones at nvidia.com> wrote:
>>>> No, the necessary extensions can not be contained within the binding.
>>>> There
>>>> is not enough information within the driver layer alone. Something needs
>>>> to
>>>> tell the driver when the configuration changes (E.g., the consumer of a
>>>> wayland surface switches from a texture to a plane) and what the new
>>>> configuration is. This would trigger the protocol notifications &
>>>> subsequent optimization within the driver.  By the nature of their API,
>>>> streams would require the compositor to take action on such configuration
>>>> changes, and streams can discover the new configuration.  Something
>>>> equivalent would be required to make this work in the GBM+wl_drm/EGL
>>>> case.
>>>
>>> I don't think this is the case. As I went through with Andy, we
>>> _already_ have intent expressed in the GBM case, in the exact same way
>>> that EGLStreams does: consider gbm_bo_import as equivalent for
>>> attaching to an EGLOutput(Layer) consumer, and EGLImage import +
>>> TargetTexture2D as equivalent for attaching a gltexture consumer.
>>
>>
>> "Will be used for display on device X" is not sufficient information, as
>> Daniel Vetter outlined.
>
> Indeed, but nothing we have - including both the initial Streams
> patchset, and the subsequent proposals for adding muxing as well as
> KMS config passthrough - is sufficient for that.
>
> The Streams Check/Commit proposal you outlined a couple of mails ago
> isn't sufficient because you often need to know global configuration
> to determine if a configuration is even usable, let alone optimal:
> shared decompression/detiling units, global bandwidth/watermark
> limits, etc. Having just one entrypoint to Streams where it gets very
> limited information about each plane that Streams is using isn't
> enough, because it needs to know the global configuration.
>
> So to actually make this work on other hardware, you'd need to pass
> the full request (including content which came via other sources, e.g.
> dmabuf) through to Streams. And by the time you're handing your entire
> scene graph off to an external component to determine the optimal
> configuration ... well, that's basically HWC.

I'm sorry for mixing them up again by alluding to Daniel Vetter's 
statement, but there are two separate things being discussed here:

-A fully optimal scene-graph.  This is important, but not solved by 
streams alone.  Streams could work as one of several building blocks in 
a solution for this.

-Optimal presentation and allocation of buffers between two endpoints 
(I.e., optimizing frame allocation and delivery for what Weston can do 
right now).  My claim was that current streams solve this, while current 
GBM does not provide enough information for even this optimization.

Solving the global scene graph optimization problem is important, but 
will require additional work.  The incremental gains from using streams 
(worth around 10% raw throughput on Kepler-based NVIDIA GPUs for 
example, supposedly more on later hardware though I've not yet 
benchmarked directly there) should not be ignored just because they 
don't achieve perfection in a single step.  Incremental improvements are 
still valuable.

> I'm also not sure what the plan is for integrating with Vulkan
> compositors: does that end up as an interop extension? Does VK WSI
> gain an equivalent which allows you to mux swapchain/device_swapchain?
> (Similar questions for the Check/Commit API really.)

Yes, if an EGL-based client was presenting to a Vulkan-based compositor, 
interop would be happening somewhere.  Either yet-to-be-developed Vulkan 
primitives could be used to implement the wayland-egl library with 
interop on the client side, or EGLStreams could be used to implement the 
wayland-egl library with interop on the server side.  Or there could be 
EGL->(wl_drm)->Vulkan, which is essentially 2 interop steps, but that 
has the same shortcomings we've been discussing for the current 
EGL->(wl_drm)->EGL/GBM+DRM situation.

>>> This
>>> is the exact same proxy for intent to display, and in fact the GBM
>>> approach is slightly more flexible, because it allows you to both do
>>> direct scanout as well as GPU composition (e.g. if you're
>>> capturing/streaming at the same time as display).
>>>
>>> Again though, without stream-retargeting, this is not something which
>>> exists in Streams today, and doing so is going to require more
>>> extensions: more code in your driver, more code in every
>>> implementation. GBM today, for all its faults, does not require
>>> further API extension to make this work.
>>
>> Agreed.  We're working on similar flexibility for streams via an EGLSwitch
>> muxing extension.  As mentioned above, GBM would require API extensions and
>> driver updates to reach the expressiveness of streams as well though.
>
> Right - but as with the point I was making below, GBM _right now_ is
> more capable than Streams _right now_. GBM right now would require API
> additions to match EGLStreams + EGLSwitch + Streams/KMS-interop, but
> the last two aren't written either, so. (More below.)

The current behavior that enables this, where basically all Wayland 
buffers must be allocated as scanout-capable, isn't reasonable on NVIDIA 
hardware.  The requirements for scanout are too onerous.  I'm sure it 
works in demos on nouveau, but it's not realistic for a production driver.

>> What streams exposes is intended to lower the amount of stuff hidden in
>> drivers, not increase it.  Streams is a generic swapchain mechanism exposed
>> to any user, whereas we would need to write something proprietary (maybe
>> open source, maybe closed source, but NVIDIA-specific none the less) for
>> each window system to get equivalent performance if we pushed the
>> abstraction to a lower level.
>
> Hm, I'm not quite sure how this adds up. Streams + Switch +
> Streams/KMS interop is a _lot_ of complexity that goes buried into
> drivers, with no external visibility. I don't doubt your ability to
> get it right, but I _do_ doubt the ability of others to get this
> right. As you say, Streams is intended to make these problems go away,
> but it doesn't disappear, it just shifts elsewhere.

I agree with much of the above, but I don't think it's at odds with my 
statement.

Yes, something still needs to solve the problem of which type of buffer 
is best for the combination of producer X and consumer Y.  However, this 
is always going to be hardware-specific, so a vendor-specific backend is 
going to be the best place for it regardless of where that backend 
lives.  EGLSwitch/supporting multiple possible consumers with one 
preferred one just makes that decision more complex, but doesn't change 
the HW-specific nature of the process.

Something needs to handle the operations that prepare a buffer for use 
on consumer Y after producer X has completed its work, and vice-versa. 
Again, what exactly those operations are is HW-specific, so they're 
going to live in HW-specific portions of the library (eglSwapBuffers(), 
or the Vulkan layout transitions + memory barriers).

The KMS interactions are trivial: Filling in some framebuffer attributes 
on an atomic request.  The rest of the atomic request setup could still 
be done non-opaquely since, as you've pointed out, EGLStreams don't 
solve the overall configuration optimization problem.

Comparing:

(a) The minimal set (or as close to it as possible) of HW-specific
operations encapsulated in one object (a stream) that can be re-used
across various higher-level projects.

(b) Implementing several similar but slightly different window system
integration modules in each driver along with the above necessary 
encapsulations.

It seems to me that (a) results in less overall encapsulation.

> I worry that, by
> the time you're done building out all the capability you're talking
> about on top of Streams, we'll end up with a spec that will be
> interpreted and implemented quite differently by every vendor.

The same could be said of any standard or API that attempts to address a 
complex use case.  We could agree to require standardized testing at the 
Khronos level (It wouldn't be the first time EGL conformance was 
suggested), or unofficially require piglit tests for the necessary 
stream extensions if that would help.  Arguably, weston could act as the 
de-facto conformance test too, though.

>>> Media falls down because currently there is no zerocopy binding from
>>> either hardware or software media decode engines. Perhaps not the case
>>> on your hardware, unusually blessed with a great deal of memory
>>> bandwidth, but a great many devices physically cannot cope with a
>>> single copy in the pipeline, given the ratio of content size to memory
>>> bandwidth. Doing this in EGL would require a 'draw' step which simply
>>> presented an existing buffer - a step which would unnecessarily
>>> involve the GPU if the pipeline is direct from decode to scanout - or
>>> it would involve having every media engine write their own bindings to
>>> the Streams protocol.
>>
>> Right.  Streams are meant to support lot's of different producers and
>> consumers.
>
> Have you looked much at the media landscape, and discussed it with
> relevant projects - GStreamer, Kodi/XBMC, etc?

I haven't personally.  Others in NVIDIA are working on the multimedia 
aspects of streams.

>>> There are also incredibly exacting timing requirements for media
>>> display, which the Streams model of 'single permanently fixed latency'
>>> does not even come close to achieving. So for that you'd need another
>>> extension, to report actual achieved timings back. Wayland today
>>> fulfills these requirements with the zlinux_dmabuf and
>>> presentation_timing protocols, with the original hardware timings fed
>>> back through KMS.
>>
>>
>> Would it be reasonable to support such existing extensions while using
>> streams?
>
> Again, you'd need to add quite a bit of new API to Streams. In
> particular, every frame would need to gain two EGL objects: one for
> the producer which could be used to obtain presentation feedback, and
> one for the consumer which could be used to submit presentation
> feedback. And with this, you bang hard into EGL's lack of signalling,
> unless clients are expected to either poll or spin up a separate
> thread just to block.

The existing feedback mechanisms couldn't be used along side streams 
without integrating them into EGL?  Streams just deliver frames, but it 
should be possible to correlate those frames with some external 
mechanism providing feedback on them.

>>> Every additional codepath has its cost. Even if you just look at
>>> Mutter and Weston in a vacuum, it seems like it'll be quite the large
>>> patchset(s) by the time it's done, let alone extending it out to all
>>> the other compositors. This is a patchset which will need constant
>>> care and feeding: if it's not tested, it's broken. Right now, there is
>>> only one Streams implementation available, which is in a driver whose
>>> legal status is seen to be sufficiently problematic that it is not
>>> generally distributed by downstreams, which requires a whole set of
>>> external kernel patches to run. So even getting it to run is
>>> non-trivial.
>>>
>>> But then we'd have to do that in such a way that it was generally
>>> available, else any refactoring or changes we wanted to do internally
>>> would have to be blocked on testing/review from someone who knew that
>>> backend well enough. Either that, or it would just get broken.
>>> Introducing these codepaths has a very, very, real cost to the
>>> projects you're talking about.
>>
>>
>> If there were an open source implementation of streams, would that affect
>> your view?
>
> It would definitely make it significantly easier, especially as we
> work towards things like continuous integration (see kernelci.org -
> and then extend that upwards a bit). Something that is open, doesn't
> require non-mainline kernels (or at least has a path where you can see
> it working towards running on mainline), runs on real hardware, etc,
> would really make it much easier.
>
>>> You could quite rightly point to the Raspberry Pi DispManX backend as
>>> an example of the same, and you'd be right. And that's why I'm
>>> extremely enthused about how their new KMS/GBM driver allows us to
>>> nuke the entire backend from orbit, and reduce our testing load by
>>> shifting them to the generic driver.
>>
>>
>> I hope we can avoid an entirely forked compositor-drm/eglstream (and
>> especially gl-renderer) for these reasons.  The majority of the code is
>> still common and would be exercised using either path.
>
> Oh, I'm talking about a three-way split: gl-renderer-common.c,
> gl-renderer-eglimage.c, gl-renderer-eglstreams.c, and the same for
> compositor-drm.c. It's not reasonable to require you to write your own
> DRM backlight property handling, or Weston -> GL scene-graph
> transformation handling.

That does sound like a reasonable direction.  Would you consider such a 
refactoring palatable?

>>> It is unfortunate that you seem to discuss 'Streams' as an abstract
>>> concept of a cross-process swapchain which can be infinitely adjusted
>>> to achieve perfection, and yet 'GBM' gets discussed as a singular
>>> fixed-in-time thing which has all the flaws of just one of its
>>> particular platform implementations.
>>
>> I have a stronger understanding of the design direction for streams than I
>> do for GBM, and EGLStream is indeed intended to evolve towards the best
>> abstraction of a swapchain possible.  My views of GBM are based on the
>> current API.  I'm not that familiar with the Mesa implementation details.
>> I'd be happy to learn more about the direction the GBM API is taking in the
>> future, and that's half of what I was attempting to do in my
>> responses/questions here.
>
> Well, this thread is hopefully shaping it!
>
>>> I don't see how GBM could really perform any worse in such a design.
>>
>> The current GBM API is not expressive enough to support optimal buffer
>> allocation (at least on our hardware) in such a design.
>
> Currently, that's objectively true of both GBM and Streams. Both are
> going to need extension to work as hoped.

Yes.  Given more work is needed (a lot more, apparently), my hope is to 
leverage that work as broadly as possible.  I hope NVIDIA's statements 
thus far have shown that a solution based on streams is more valuable in 
that regard than a solution spread across EGL, Wayland protocol, and GBM.

Thanks,
-James

> Cheers,
> Daniel
>