[PATCH RFC wayland-protocols] unstable/linux-dmabuf: add wp_linux_dmabuf_device_hint

Wed Nov 14 09:03:57 UTC 2018

On Tue, 13 Nov 2018 18:19:29 +0000
Simon Ser <contact at emersion.fr> wrote:

> > Hi Simon,
> >
> > On Fri, 2018-11-02 at 18:49 +0000, Simon Ser wrote:  
> > > On Friday, November 2, 2018 12:30 PM, Philipp Zabel <p.zabel at pengutronix.de> wrote:  
> > > > > > +    <event name="primary_device">
> > > > > > +      <description summary="preferred primary device">
> > > > > > +        This event advertizes the primary device that the server prefers. There
> > > > > > +        is exactly one primary device.  
> > > >
> > > > Which device should this be if the scanout engine is separate from the
> > > > render engine (e.g. IPU/imx-drm and GPU/etnaviv on i.MX6)  
> > >
> > > When the surface hints are created, I expect the compositor to send the device
> > > it uses for compositing as the primary device (assuming it's using only one
> > > device).  
> >
> > i.MX6 has a separate scanout device without any acceleration capabilities
> > except some hardware overlay planes, and a pure GPU render device without
> > any connection to the outside world. The compositor uses both devices for
> > compositing and output.  
> 
> But most of the time, client buffers will go through compositing. So the
> primary device is still the render device.
> 
> The situation doesn't change a lot compared to wl_drm to be honest. The device
> that is advertised via wl_drm will be the primary device advertised by this
> protocol.
> 
> Maybe when the compositor decides to scan-out a client, it can switch the
> primary device to the scan-out device. Sorry, I don't know enough about these
> particular devices to say for sure.

Hi,

I do see Philipp's point after thinking for a while. I'll explain below.

> > > > When the surface becomes fullscreen on a different GPU (meaning it becomes  
> > > fullscreen on an output which is managed by another GPU), I'd expect the
> > > compositor to change the primary device for this surface to this other GPU.
> > >
> > > If the compositor uses multiple devices for compositing, it'll probably switch
> > > the primary device when the surface is moved from one GPU to the other.
> > >
> > > I'm not sure how i.MX6 works, but: even if the same GPU is used for compositing
> > > and scanout, but the compositing preferred formats are different from the
> > > scanout preferred formats, the compositor can update the preferred format
> > > without changing the preferred device.
> > >
> > > Is there an issue with this? Maybe something should be added to the protocol to
> > > explain it better?  
> >
> > It is not clear to me from the protocol description whether the primary
> > device means the scanout engine or the GPU, in case they are different.
> >
> > What is the client process supposed to do with this fd? Is it expected
> > to be able to render on this device? Or use it to allocate the optimal
> > buffers?  
> 
> The client is expected to allocate its buffers there. I'm not sure about
> rendering.

Well, actually...

> > > > What about contiguous vs non-contiguous memory?
> > > >
> > > > On i.MX6QP (Vivante GC3000) we would probably want the client to always
> > > > render DRM_FORMAT_MOD_VIVANTE_SUPER_TILED, because this can be directly
> > > > read by both texture samplers (non-contiguous) and scanout (must be
> > > > contiguous).
> > > >
> > > > On i.MX6Q (Vivante GC2000) we always want to use the most efficient
> > > > DRM_FORMAT_MOD_VIVANTE_SPLIT_SUPER_TILED, because neither of the
> > > > supported render formats can be sampled or scanned out directly.
> > > > Since the compositor has to resolve into DRM_FORMAT_MOD_VIVANTE_TILED
> > > > (non-contiguous) for texture sampling or DRM_FORMAT_MOD_LINEAR
> > > > (contiguous) for scanout, the client buffers can always be non-
> > > > contiguous.
> > > >
> > > > On i.MX6S (Vivante GC880) the optimal render format for texture sampling
> > > > would be DRM_FORMAT_MOD_VIVANTE_TILED (non-contiguous) and for scanout
> > > > DRM_FORMAT_MOD_VIVANTE_SUPER_TILED (non-contiguous) which would be
> > > > resolved into DRM_FORMAT_MOD_LINEAR (contiguous) by the compositor.  
> > >
> > > I think all of this works with Daniel's design.
> > >  
> > > > All three could always handle DRM_FORMAT_MOD_LINEAR (contiguous) client
> > > > buffers for scanout directly, but those would be suboptimal if the
> > > > compositor decides to render on short notice, because the client would
> > > > have already resolved into linear and then the compositor would have to
> > > > resolve back into a texture sampler tiling format.  
> > >
> > > Is the concern here that switching between scanout and compositing is
> > > non-optimal until the client chooses the preferred format?  
> >
> > My point is just that whether or not the buffer must be contiguous in
> > physical memory is the essential piece of information on i.MX6QP,
> > whereas the optimal tiling modifier is the same for both GPU composition
> > and direct scanout cases.
> >
> > If the client provides non-contiguous buffers, the "optimal" tiling
> > doesn't help one bit in the scanout case, as the scanout hardware can't
> > read from those.  
> 
> Sorry, I don't get what you mean. Can you please try to explain again?

The hints protocol we are discussing here is a subset of what
https://github.com/cubanismo/allocator aims to achieve. Originally we
only concentrated on getting the format and modifier more optimal, but
the question of where and how to allocate the buffers is valid too. Is
it in scope for this extension is the big question below.

Ideally, the protocol would do something like this:

- Tell the client which device and for which use case the device must
  be able to access the buffer at minimum and always.

- Tell the client that if it could make the buffer suitable also for a
  secondary device and a secondary use case, the compositor could do a
  more optimal job (e.g. putting the buffer in direct scanout,
  bypassing composition, or a hardware video encoder in case the output
  is going to be streamed).

We don't have the vocabulary for use cases and there are tons of
different details to be taken into account, which is the whole point of
the allocator project. So we cannot do the complete solution here and
now, but we can do an approximate solution by negotiating pixel
formats and modifiers.

The primary device is what the compositor uses for the fallback path,
which is compositing with a GPU. Therefore at very minimum, clients
need to allocate buffers that can be used with the primary device. We
guarantee this in the zwp_linux_dmabuf protocol by having the
compositor test the buffer import into EGL (or equivalent) before it
accepts that the buffer even exists. The client does not absolutely
necessarily need the primary device for this, but it will have much
better chances of making usable buffers if it uses it for allocation at
least.

The primary device also has another very different meaning: the
compositor will likely be using the primary device anyway so it is kept
active and if clients use the same device instead of some other device,
it probably results in considerable power savings. IOW, the primary
device is the preferred rendering device as well. Or so I assume, these
two concepts could be decoupled as well.

A secondary device is optional. In system where the GPU and display
devices are separate DRM devices, the GPU will be the primary device,
and the display device would be the secondary device. So there seems to
be a use case for sending the secondary device (or devices?) in
addition to the primary device.

AFAIK, the unix device memory allocator project does not yet have
anything we should be encoding as a Wayland extension, so all we seem
to be able to do is to deliver the device file descriptors and the
format+modifier sets.

Now the design question: do we want to communicate the secondary
devices in this extension? Quite likely we need a different extension
to be used with the allocator project.

Is communicating the display device fd useful already when it differs
from the rendering device? Is there a way for generic client userspace
to use it effectively, or would it rely on hardware-specific code in
clients rather than in e.g. Mesa drivers? Are there EGL or Vulkan APIs
to tell the driver it should make the buffer work on one device while
rendering on another?

My current opinion is that if there is no generic way for an
application to benefit from the secondary device fd, then we should not
add secondary devices in this extension yet.

Thanks,
pq
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/wayland-devel/attachments/20181114/5be5b8fb/attachment.sig>