about mmap dma-buf and sync

Wed Aug 19 23:48:23 PDT 2015

Hi, Tiago!

On 08/20/2015 12:33 AM, Tiago Vignatti wrote:
> Hey Thomas, you haven't answered my email about making SYNC_* mandatory:
>
> http://lists.freedesktop.org/archives/dri-devel/2015-August/088376.html

Hmm, for some reason it doesn't show up in my mail app, but I found it
in the archives. An attempt to explain the situation from the vmwgfx
perspective.

The fact that the interface is generic means that people will start
using it for the zero-copy case. There has been a couple of more or less
hackish attempts to do this before, and if it's a _driver_ interface we
don't need to be that careful but if it is a _generic_ interface we need
to be very careful to make it fit *all* the hardware out there and that
we make all potential users use the interface in a way that conforms
with the interface specification.

What will happen otherwise is that apps written for coherent fast
hardware might, for example, ignore calling the SYNC api, just because
the app writer only cared about his own hardware on which the app works
fine. That would fail miserably if the same app was run on incoherent
hardware, or the incoherent hardware driver maintainers would be forced
to base an implementation on page-faults which would be very slow.

So assume the following use case: An app updates a 10x10 area using the
CPU on a 1600x1200 dma-buf, and it will then use the dma-buf for
texturing. On some hardware the dma-buf might be tiled in a very
specific way, on vmwgfx the dma-buf is a GPU buffer on the host, only
accessible using DMA. On vmwgfx the SYNC operation must carry out a
10x10 DMA from the host GPU buffer to a guest CPU buffer before the CPU
write and a DMA back again after the write, before GPU usage. On the
tiled architecture the SYNC operation must untile before CPU access and
probably tile again before GPU access.

If we now have a one-dimensional SYNC api, in this particular case we'd
either need to sync a far too large area (1600x10) or call SYNC 10 times
before writing, and then again after writing. If the app forgot to call
SYNC we must error.

So to summarize, the fact that the interface is generic IMO means:

1) Any user must be able to make valid assumptions about the internal
format of the dma-buf. (untiled, color format, stride etc.)
2) Any user *must* call SYNC before and after CPU access. On coherent
architectures, the SYNC is a NULL operation anyway, and that should be
documented somewhere so that maintainers of drivers of uncoherent
architectures have somewhere to point their fingers.
3) Driver-specific implementations must be allowed to error (segfault)
if SYNC has not been used.
4) The use-case stated above clearly shows the benefit of a
2-dimensional sync interface (we want to sync the 10x10 region), but
what if someone in the future wants to use this interface for a 3D
texture? Will a 2D sync suffice? Can we make the SYNC interface
extendable in a way that an enum sync_type member defines the layout of
the argument, and initially we implement only 1d, 2d sync, leaving 3d
for the future?

Also, I agree there is probably no good way to generically implement an
error if SYNC has not been called. That needs to be left as an option to
drivers.

Thanks,
Thomas