Introduction and updates from NVIDIA

Wed May 4 15:56:01 UTC 2016

Hi,
Interleaving both replies ...

On 3 May 2016 at 19:44, James Jones <jajones at nvidia.com> wrote:
> On 05/03/2016 09:53 AM, Daniel Stone wrote:
>> On 3 May 2016 at 17:07, James Jones <jajones at nvidia.com> wrote:
>>> No, the necessary extensions can not be contained within the binding.
>>> There
>>> is not enough information within the driver layer alone. Something needs
>>> to
>>> tell the driver when the configuration changes (E.g., the consumer of a
>>> wayland surface switches from a texture to a plane) and what the new
>>> configuration is. This would trigger the protocol notifications &
>>> subsequent optimization within the driver.  By the nature of their API,
>>> streams would require the compositor to take action on such configuration
>>> changes, and streams can discover the new configuration.  Something
>>> equivalent would be required to make this work in the GBM+wl_drm/EGL
>>> case.
>>
>> I don't think this is the case. As I went through with Andy, we
>> _already_ have intent expressed in the GBM case, in the exact same way
>> that EGLStreams does: consider gbm_bo_import as equivalent for
>> attaching to an EGLOutput(Layer) consumer, and EGLImage import +
>> TargetTexture2D as equivalent for attaching a gltexture consumer.
>
>
> "Will be used for display on device X" is not sufficient information, as
> Daniel Vetter outlined.

Indeed, but nothing we have - including both the initial Streams
patchset, and the subsequent proposals for adding muxing as well as
KMS config passthrough - is sufficient for that.

The Streams Check/Commit proposal you outlined a couple of mails ago
isn't sufficient because you often need to know global configuration
to determine if a configuration is even usable, let alone optimal:
shared decompression/detiling units, global bandwidth/watermark
limits, etc. Having just one entrypoint to Streams where it gets very
limited information about each plane that Streams is using isn't
enough, because it needs to know the global configuration.

So to actually make this work on other hardware, you'd need to pass
the full request (including content which came via other sources, e.g.
dmabuf) through to Streams. And by the time you're handing your entire
scene graph off to an external component to determine the optimal
configuration ... well, that's basically HWC.

I'm also not sure what the plan is for integrating with Vulkan
compositors: does that end up as an interop extension? Does VK WSI
gain an equivalent which allows you to mux swapchain/device_swapchain?
(Similar questions for the Check/Commit API really.)

>> This
>> is the exact same proxy for intent to display, and in fact the GBM
>> approach is slightly more flexible, because it allows you to both do
>> direct scanout as well as GPU composition (e.g. if you're
>> capturing/streaming at the same time as display).
>>
>> Again though, without stream-retargeting, this is not something which
>> exists in Streams today, and doing so is going to require more
>> extensions: more code in your driver, more code in every
>> implementation. GBM today, for all its faults, does not require
>> further API extension to make this work.
>
> Agreed.  We're working on similar flexibility for streams via an EGLSwitch
> muxing extension.  As mentioned above, GBM would require API extensions and
> driver updates to reach the expressiveness of streams as well though.

Right - but as with the point I was making below, GBM _right now_ is
more capable than Streams _right now_. GBM right now would require API
additions to match EGLStreams + EGLSwitch + Streams/KMS-interop, but
the last two aren't written either, so. (More below.)

> What streams exposes is intended to lower the amount of stuff hidden in
> drivers, not increase it.  Streams is a generic swapchain mechanism exposed
> to any user, whereas we would need to write something proprietary (maybe
> open source, maybe closed source, but NVIDIA-specific none the less) for
> each window system to get equivalent performance if we pushed the
> abstraction to a lower level.

Hm, I'm not quite sure how this adds up. Streams + Switch +
Streams/KMS interop is a _lot_ of complexity that goes buried into
drivers, with no external visibility. I don't doubt your ability to
get it right, but I _do_ doubt the ability of others to get this
right. As you say, Streams is intended to make these problems go away,
but it doesn't disappear, it just shifts elsewhere. I worry that, by
the time you're done building out all the capability you're talking
about on top of Streams, we'll end up with a spec that will be
interpreted and implemented quite differently by every vendor.

>> Media falls down because currently there is no zerocopy binding from
>> either hardware or software media decode engines. Perhaps not the case
>> on your hardware, unusually blessed with a great deal of memory
>> bandwidth, but a great many devices physically cannot cope with a
>> single copy in the pipeline, given the ratio of content size to memory
>> bandwidth. Doing this in EGL would require a 'draw' step which simply
>> presented an existing buffer - a step which would unnecessarily
>> involve the GPU if the pipeline is direct from decode to scanout - or
>> it would involve having every media engine write their own bindings to
>> the Streams protocol.
>
> Right.  Streams are meant to support lot's of different producers and
> consumers.

Have you looked much at the media landscape, and discussed it with
relevant projects - GStreamer, Kodi/XBMC, etc?

>> There are also incredibly exacting timing requirements for media
>> display, which the Streams model of 'single permanently fixed latency'
>> does not even come close to achieving. So for that you'd need another
>> extension, to report actual achieved timings back. Wayland today
>> fulfills these requirements with the zlinux_dmabuf and
>> presentation_timing protocols, with the original hardware timings fed
>> back through KMS.
>
>
> Would it be reasonable to support such existing extensions while using
> streams?

Again, you'd need to add quite a bit of new API to Streams. In
particular, every frame would need to gain two EGL objects: one for
the producer which could be used to obtain presentation feedback, and
one for the consumer which could be used to submit presentation
feedback. And with this, you bang hard into EGL's lack of signalling,
unless clients are expected to either poll or spin up a separate
thread just to block.

>> Every additional codepath has its cost. Even if you just look at
>> Mutter and Weston in a vacuum, it seems like it'll be quite the large
>> patchset(s) by the time it's done, let alone extending it out to all
>> the other compositors. This is a patchset which will need constant
>> care and feeding: if it's not tested, it's broken. Right now, there is
>> only one Streams implementation available, which is in a driver whose
>> legal status is seen to be sufficiently problematic that it is not
>> generally distributed by downstreams, which requires a whole set of
>> external kernel patches to run. So even getting it to run is
>> non-trivial.
>>
>> But then we'd have to do that in such a way that it was generally
>> available, else any refactoring or changes we wanted to do internally
>> would have to be blocked on testing/review from someone who knew that
>> backend well enough. Either that, or it would just get broken.
>> Introducing these codepaths has a very, very, real cost to the
>> projects you're talking about.
>
>
> If there were an open source implementation of streams, would that affect
> your view?

It would definitely make it significantly easier, especially as we
work towards things like continuous integration (see kernelci.org -
and then extend that upwards a bit). Something that is open, doesn't
require non-mainline kernels (or at least has a path where you can see
it working towards running on mainline), runs on real hardware, etc,
would really make it much easier.

>> You could quite rightly point to the Raspberry Pi DispManX backend as
>> an example of the same, and you'd be right. And that's why I'm
>> extremely enthused about how their new KMS/GBM driver allows us to
>> nuke the entire backend from orbit, and reduce our testing load by
>> shifting them to the generic driver.
>
>
> I hope we can avoid an entirely forked compositor-drm/eglstream (and
> especially gl-renderer) for these reasons.  The majority of the code is
> still common and would be exercised using either path.

Oh, I'm talking about a three-way split: gl-renderer-common.c,
gl-renderer-eglimage.c, gl-renderer-eglstreams.c, and the same for
compositor-drm.c. It's not reasonable to require you to write your own
DRM backlight property handling, or Weston -> GL scene-graph
transformation handling.

>> It is unfortunate that you seem to discuss 'Streams' as an abstract
>> concept of a cross-process swapchain which can be infinitely adjusted
>> to achieve perfection, and yet 'GBM' gets discussed as a singular
>> fixed-in-time thing which has all the flaws of just one of its
>> particular platform implementations.
>
> I have a stronger understanding of the design direction for streams than I
> do for GBM, and EGLStream is indeed intended to evolve towards the best
> abstraction of a swapchain possible.  My views of GBM are based on the
> current API.  I'm not that familiar with the Mesa implementation details.
> I'd be happy to learn more about the direction the GBM API is taking in the
> future, and that's half of what I was attempting to do in my
> responses/questions here.

Well, this thread is hopefully shaping it!

>> I don't see how GBM could really perform any worse in such a design.
>
> The current GBM API is not expressive enough to support optimal buffer
> allocation (at least on our hardware) in such a design.

Currently, that's objectively true of both GBM and Streams. Both are
going to need extension to work as hoped.

Cheers,
Daniel