about mmap dma-buf and sync

Thu Aug 20 12:32:18 PDT 2015

On 08/20/2015 09:27 PM, Thomas Hellstrom wrote:
> On 08/20/2015 04:33 PM, Rob Clark wrote:
>> On Thu, Aug 20, 2015 at 2:48 AM, Thomas Hellstrom <thellstrom at vmware.com> wrote:
>>> Hi, Tiago!
>>>
>>> On 08/20/2015 12:33 AM, Tiago Vignatti wrote:
>>>> Hey Thomas, you haven't answered my email about making SYNC_* mandatory:
>>>>
>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/088376.html
>>> Hmm, for some reason it doesn't show up in my mail app, but I found it
>>> in the archives. An attempt to explain the situation from the vmwgfx
>>> perspective.
>>>
>>> The fact that the interface is generic means that people will start
>>> using it for the zero-copy case. There has been a couple of more or less
>>> hackish attempts to do this before, and if it's a _driver_ interface we
>>> don't need to be that careful but if it is a _generic_ interface we need
>>> to be very careful to make it fit *all* the hardware out there and that
>>> we make all potential users use the interface in a way that conforms
>>> with the interface specification.
>>>
>>> What will happen otherwise is that apps written for coherent fast
>>> hardware might, for example, ignore calling the SYNC api, just because
>>> the app writer only cared about his own hardware on which the app works
>>> fine. That would fail miserably if the same app was run on incoherent
>>> hardware, or the incoherent hardware driver maintainers would be forced
>>> to base an implementation on page-faults which would be very slow.
>>>
>>> So assume the following use case: An app updates a 10x10 area using the
>>> CPU on a 1600x1200 dma-buf, and it will then use the dma-buf for
>>> texturing. On some hardware the dma-buf might be tiled in a very
>>> specific way, on vmwgfx the dma-buf is a GPU buffer on the host, only
>>> accessible using DMA. On vmwgfx the SYNC operation must carry out a
>>> 10x10 DMA from the host GPU buffer to a guest CPU buffer before the CPU
>>> write and a DMA back again after the write, before GPU usage. On the
>>> tiled architecture the SYNC operation must untile before CPU access and
>>> probably tile again before GPU access.
>>>
>>> If we now have a one-dimensional SYNC api, in this particular case we'd
>>> either need to sync a far too large area (1600x10) or call SYNC 10 times
>>> before writing, and then again after writing. If the app forgot to call
>>> SYNC we must error.
>> just curious, but couldn't you batch up the 10 10x1 sync's?
> Yes that would work up to the first CPU access. Subsequent syncs would
> need to be carried out immediately or all ptes would need to be unmapped
> to detect the next CPU access. Write only syncs could probably be
> batched unconditionally.
>
> /Thomas

But aside from the problem of subsequent syncs after first CPU access,
does user-space really want to call sync for each line? Probably not,
but that's a problem that can be postponed (2D sync getting a separate
IOCTL) until someone gets tired of calling 1D syncs. My feeling is,
however that that will happen rather quickly and at least 2D syncs will
be a common usecase.

/Thomas

>
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel