about mmap dma-buf and sync

Thu Aug 20 12:27:08 PDT 2015

On 08/20/2015 04:33 PM, Rob Clark wrote:
> On Thu, Aug 20, 2015 at 2:48 AM, Thomas Hellstrom <thellstrom at vmware.com> wrote:
>> Hi, Tiago!
>>
>> On 08/20/2015 12:33 AM, Tiago Vignatti wrote:
>>> Hey Thomas, you haven't answered my email about making SYNC_* mandatory:
>>>
>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/088376.html
>> Hmm, for some reason it doesn't show up in my mail app, but I found it
>> in the archives. An attempt to explain the situation from the vmwgfx
>> perspective.
>>
>> The fact that the interface is generic means that people will start
>> using it for the zero-copy case. There has been a couple of more or less
>> hackish attempts to do this before, and if it's a _driver_ interface we
>> don't need to be that careful but if it is a _generic_ interface we need
>> to be very careful to make it fit *all* the hardware out there and that
>> we make all potential users use the interface in a way that conforms
>> with the interface specification.
>>
>> What will happen otherwise is that apps written for coherent fast
>> hardware might, for example, ignore calling the SYNC api, just because
>> the app writer only cared about his own hardware on which the app works
>> fine. That would fail miserably if the same app was run on incoherent
>> hardware, or the incoherent hardware driver maintainers would be forced
>> to base an implementation on page-faults which would be very slow.
>>
>> So assume the following use case: An app updates a 10x10 area using the
>> CPU on a 1600x1200 dma-buf, and it will then use the dma-buf for
>> texturing. On some hardware the dma-buf might be tiled in a very
>> specific way, on vmwgfx the dma-buf is a GPU buffer on the host, only
>> accessible using DMA. On vmwgfx the SYNC operation must carry out a
>> 10x10 DMA from the host GPU buffer to a guest CPU buffer before the CPU
>> write and a DMA back again after the write, before GPU usage. On the
>> tiled architecture the SYNC operation must untile before CPU access and
>> probably tile again before GPU access.
>>
>> If we now have a one-dimensional SYNC api, in this particular case we'd
>> either need to sync a far too large area (1600x10) or call SYNC 10 times
>> before writing, and then again after writing. If the app forgot to call
>> SYNC we must error.
> just curious, but couldn't you batch up the 10 10x1 sync's?

Yes that would work up to the first CPU access. Subsequent syncs would
need to be carried out immediately or all ptes would need to be unmapped
to detect the next CPU access. Write only syncs could probably be
batched unconditionally.

/Thomas