Wayland generic dmabuf protocol

Wed Jun 11 23:01:56 PDT 2014

On Wed, 11 Jun 2014 12:00:57 -0400
Rob Clark <robdclark at gmail.com> wrote:

> On Mon, Jun 9, 2014 at 8:44 AM, Pekka Paalanen
> <pekka.paalanen at collabora.co.uk> wrote:
> > On Mon, 9 Jun 2014 12:23:18 +0100
> > Daniel Stone <daniel at fooishbar.org> wrote:
> >
> >> Hi,
> >>
> >> On 9 June 2014 12:06, Pekka Paalanen <pekka.paalanen at collabora.co.uk> wrote:
> >>
> >> > On Mon, 9 Jun 2014 11:00:04 +0200
> >> > Benjamin Gaignard <benjamin.gaignard at linaro.org> wrote:
> >> > > One of the main comment on the latest patches was that wl_dmabuf use
> >> > > DRM for buffer allocation.
> >> > > This appear to be an issue since wayland doesn't want to rely on one
> >> > > specific framework (DRM, or V4L2) for buffer allocation, so we have
> >> > > start working on a "central dmabuf allocation" on kernel side. The
> >> > > goal is provide some as generic as possible to make it acceptable by
> >> > > wayland.
> >> >
> >> > Why would Wayland need a central allocator for dmabuf?
> >> >
> >>
> >> I think you've just answered your own question further below:
> >>
> >>
> >> > > On my hardware the patches you have (+ this one on gstwaylandsink
> >> > > https://bugzilla.gnome.org/show_bug.cgi?id=711155) allow me to do zero
> >> > > copy between the hardware video decoder and the display engine. I
> >> > > don't have implemented GPU yet because my hardware is able to do
> >> > > compose few video overlays planes and it was enough for my tests.
> >> >
> >> > Right.
> >> >
> >> > What I have been thinking is, that the compositor must be able to use
> >> > the new wl_buffer and we need to guarantee that before-hand. If the
> >> > compositor fails to use a wl_buffer when the client has already
> >> > attached it to a wl_surface and it is time to repaint, it is too late
> >> > and the user will see a glitch. Recovering from that requires asking
> >> > the client to provide a new wl_buffer of a different kind, which might
> >> > take time. Or a very rude compositor would just send a protocol error,
> >> > and then we'd get bug reports like "the video player just disappears
> >> > when I try to play (and ps. I have an old kernel that doesn't support
> >> > importing whatever)".
> >> >
> >> > I believe we must allow the compositor to test the wl_buffer before it
> >> > is usable for the client. That is the reason for the roundtrippy design
> >> > of the below proposal.
> >> >
> >>
> >> A central allocator would solve these issues, by having everyone agree on
> >> the restrictions upfront, instead of working out which of the media decode
> >> engine, camera, GPU, or display controller is the lowest common
> >> denominator, and forcing all allocations through there.
> >>
> >> One such solution was discussed a while back WRT ION:
> >> https://lwn.net/Articles/565469/
> >>
> >> See the 'possible solutions' part for a way for people to agree on
> >> restrictions wrt tiling, stride, contiguousness, etc.
> >
> > Hi,
> >
> > that's an excellent article. I didn't know that delayed allocation of
> > dmabufs was not even possible yet, which would have allowed us to
> > not think about importing failures and simply let the client fall back
> > with "ok, don't use dmabuf with this particular device then".
> 
> hrm?  I know of at least a couple drm drivers that defer allocation of
> backing pages..

I came a bit harsh there. So it is possible, and few drivers might even
do it already, but is there even an intention of requiring all drivers
to be able to defer allocation?

Though if migration is going to work, the only downside of not doing
deferred allocation would be a performance penalty in the beginning,
right?

> > What is the conclusion here?
> >
> > Wayland protocol does not need to consider import failures at all, and
> > can simply punt those as protocol errors, which essentially kill the app
> > if they ever happen?
> >
> > Do we need to wait for the central allocator in kernel to materialize
> > before we can design the protocol? Is it simply too early to try to do
> > it now?
> 
> I do tend to think the ION/central-allocator is just substituting one
> problem for another.  It doesn't really solve the problem of how
> different devices which don't actually know each other can decide on
> buffers that they can share.  On an phone/tablet/etc you know up front
> when building the kernel what devices there are and in what uses-cases
> they will be used, etc.  But that isn't really solving the more
> general case.

Right, as I have been following the PC side in the past a lot more than
ARM or embedded, a central allocator seemed a little strange as the
final solution to me too.

> > Was the idea of dmabuf in-kernel constraint negotiation with delayed
> > allocation rejected in favour of a central allocator?
> 
> not really, that I know of.  I still think we need to spiff out
> dma-mapping to better handle placement constraints.  (Although still
> prefer format constraints to be a userspace topic.)

Sure. What I specifically am interested in, which all things would be
left for user space to control and match, as that would affect the
Wayland protocol for dmabufs via APIs like GBM and V4L.

> pengutronix is doing some work in this area:
> 
> http://elinux.org/images/b/b0/OSELAS.Presentation-DMABUF-migration.pdf

That is cool, and it also tells me that it is ok for the initial dmabuf
sharing and creating a wl_buffer protocol object to be expensive
(require one roundtrip per batch of buffers), as the setup may involve
migration even in a good case and buffer re-use is heavily recommended.

This brings a question in my mind.

A Wayland compositor must be able to use a dmabuf-based wl_buffer for
at least its fallback compositing path, let's say GLESv2 and we are
able to directly texture from the dmabuf. Then the compositor sees an
opportunity to promote the surface to a hardware overlay, and attempts
to, say, import the dmabuf a second time as a DRM FB. If it is not
possible to satisfy all of exporter, EGL-import and DRM-import
restrictions at the same time, and especially if exporter vs. DRM-import
would cause ping-ponging, it would be better to just let the DRM-import
fail, and continue with GLESv2 compositing.

Would you agree?

Could dmabuf related interfaces somehow allow for the user space to
choose how much pain is tolerable for the import to succeed?

Thanks,
pq