Wayland generic dmabuf protocol

Thu Jun 12 08:01:11 PDT 2014

On Thu, Jun 12, 2014 at 2:01 AM, Pekka Paalanen
<pekka.paalanen at collabora.co.uk> wrote:
> On Wed, 11 Jun 2014 12:00:57 -0400
> Rob Clark <robdclark at gmail.com> wrote:
>
>> On Mon, Jun 9, 2014 at 8:44 AM, Pekka Paalanen
>> <pekka.paalanen at collabora.co.uk> wrote:
>> > On Mon, 9 Jun 2014 12:23:18 +0100
>> > Daniel Stone <daniel at fooishbar.org> wrote:
>> >
>> >> Hi,
>> >>
>> >> On 9 June 2014 12:06, Pekka Paalanen <pekka.paalanen at collabora.co.uk> wrote:
>> >>
>> >> > On Mon, 9 Jun 2014 11:00:04 +0200
>> >> > Benjamin Gaignard <benjamin.gaignard at linaro.org> wrote:
>> >> > > One of the main comment on the latest patches was that wl_dmabuf use
>> >> > > DRM for buffer allocation.
>> >> > > This appear to be an issue since wayland doesn't want to rely on one
>> >> > > specific framework (DRM, or V4L2) for buffer allocation, so we have
>> >> > > start working on a "central dmabuf allocation" on kernel side. The
>> >> > > goal is provide some as generic as possible to make it acceptable by
>> >> > > wayland.
>> >> >
>> >> > Why would Wayland need a central allocator for dmabuf?
>> >> >
>> >>
>> >> I think you've just answered your own question further below:
>> >>
>> >>
>> >> > > On my hardware the patches you have (+ this one on gstwaylandsink
>> >> > > https://bugzilla.gnome.org/show_bug.cgi?id=711155) allow me to do zero
>> >> > > copy between the hardware video decoder and the display engine. I
>> >> > > don't have implemented GPU yet because my hardware is able to do
>> >> > > compose few video overlays planes and it was enough for my tests.
>> >> >
>> >> > Right.
>> >> >
>> >> > What I have been thinking is, that the compositor must be able to use
>> >> > the new wl_buffer and we need to guarantee that before-hand. If the
>> >> > compositor fails to use a wl_buffer when the client has already
>> >> > attached it to a wl_surface and it is time to repaint, it is too late
>> >> > and the user will see a glitch. Recovering from that requires asking
>> >> > the client to provide a new wl_buffer of a different kind, which might
>> >> > take time. Or a very rude compositor would just send a protocol error,
>> >> > and then we'd get bug reports like "the video player just disappears
>> >> > when I try to play (and ps. I have an old kernel that doesn't support
>> >> > importing whatever)".
>> >> >
>> >> > I believe we must allow the compositor to test the wl_buffer before it
>> >> > is usable for the client. That is the reason for the roundtrippy design
>> >> > of the below proposal.
>> >> >
>> >>
>> >> A central allocator would solve these issues, by having everyone agree on
>> >> the restrictions upfront, instead of working out which of the media decode
>> >> engine, camera, GPU, or display controller is the lowest common
>> >> denominator, and forcing all allocations through there.
>> >>
>> >> One such solution was discussed a while back WRT ION:
>> >> https://lwn.net/Articles/565469/
>> >>
>> >> See the 'possible solutions' part for a way for people to agree on
>> >> restrictions wrt tiling, stride, contiguousness, etc.
>> >
>> > Hi,
>> >
>> > that's an excellent article. I didn't know that delayed allocation of
>> > dmabufs was not even possible yet, which would have allowed us to
>> > not think about importing failures and simply let the client fall back
>> > with "ok, don't use dmabuf with this particular device then".
>>
>> hrm?  I know of at least a couple drm drivers that defer allocation of
>> backing pages..
>
> I came a bit harsh there. So it is possible, and few drivers might even
> do it already, but is there even an intention of requiring all drivers
> to be able to defer allocation?

not sure I'd go as far as to require it, but it is a pretty silly
optimization to skip..

> Though if migration is going to work, the only downside of not doing
> deferred allocation would be a performance penalty in the beginning,
> right?

right

>> > What is the conclusion here?
>> >
>> > Wayland protocol does not need to consider import failures at all, and
>> > can simply punt those as protocol errors, which essentially kill the app
>> > if they ever happen?
>> >
>> > Do we need to wait for the central allocator in kernel to materialize
>> > before we can design the protocol? Is it simply too early to try to do
>> > it now?
>>
>> I do tend to think the ION/central-allocator is just substituting one
>> problem for another.  It doesn't really solve the problem of how
>> different devices which don't actually know each other can decide on
>> buffers that they can share.  On an phone/tablet/etc you know up front
>> when building the kernel what devices there are and in what uses-cases
>> they will be used, etc.  But that isn't really solving the more
>> general case.
>
> Right, as I have been following the PC side in the past a lot more than
> ARM or embedded, a central allocator seemed a little strange as the
> final solution to me too.
>
>> > Was the idea of dmabuf in-kernel constraint negotiation with delayed
>> > allocation rejected in favour of a central allocator?
>>
>> not really, that I know of.  I still think we need to spiff out
>> dma-mapping to better handle placement constraints.  (Although still
>> prefer format constraints to be a userspace topic.)
>
> Sure. What I specifically am interested in, which all things would be
> left for user space to control and match, as that would affect the
> Wayland protocol for dmabufs via APIs like GBM and V4L.

I try to divide buffer constraints into two categories:
1) placement, ie. where the actual pages go (contiguous, special
memory range, etc)
2) format (fourcc, tiling format, pitch restrictions)

For most (all?) of the drm drivers, at the GEM level we do not
necessarily have any information about category #2.  All the kernel
cares about is category #1 in most cases.

Also, in at least some cases (gstreamer is a good example), there is
already a mechanism in place for negotiating #2.

This is my reasoning behind the conclusion that dmabuf (and kernel
level APIs) should care about #1, and userspace should care about #2.

>> pengutronix is doing some work in this area:
>>
>> http://elinux.org/images/b/b0/OSELAS.Presentation-DMABUF-migration.pdf
>
> That is cool, and it also tells me that it is ok for the initial dmabuf
> sharing and creating a wl_buffer protocol object to be expensive
> (require one roundtrip per batch of buffers), as the setup may involve
> migration even in a good case and buffer re-use is heavily recommended.
>
> This brings a question in my mind.
>
> A Wayland compositor must be able to use a dmabuf-based wl_buffer for
> at least its fallback compositing path, let's say GLESv2 and we are
> able to directly texture from the dmabuf. Then the compositor sees an
> opportunity to promote the surface to a hardware overlay, and attempts
> to, say, import the dmabuf a second time as a DRM FB. If it is not
> possible to satisfy all of exporter, EGL-import and DRM-import
> restrictions at the same time, and especially if exporter vs. DRM-import
> would cause ping-ponging, it would be better to just let the DRM-import
> fail, and continue with GLESv2 compositing.
>
> Would you agree?
>
> Could dmabuf related interfaces somehow allow for the user space to
> choose how much pain is tolerable for the import to succeed?
>

hmm, this is actually an interesting idea.  So far the assumption has
been that, if you could not actually share buffers between devices
that userspace would do something different.

It seems like it would be worthwhile for userspace to know how
expensive sharing will be vs just using the window surface as a
texture..

BR,
-R

>
> Thanks,
> pq