about mmap dma-buf and sync

Tue Aug 25 02:30:09 PDT 2015

On 08/25/2015 11:02 AM, Daniel Vetter wrote:
> On Mon, Aug 24, 2015 at 04:52:13PM +0100, Daniel Stone wrote:
>> Hi Thomas,
>>
>> On 24 August 2015 at 10:50, Thomas Hellstrom <thomas at shipmail.org> wrote:
>>> In any case, Ideally I'd want the struct dma_buf_sync look something like
>>>
>>> enum dma_buf_sync_flags {
>>>         DMA_BUF_SYNC_READ = (1 << 0),
>>>         DMA_BUF_SYNC_WRITE = (2 << 0),
>>>         DMA_BUF_SYNC_RW = (3 << 0),
>>>         DMA_BUF_SYNC_START = (0 << 2),
>>>         DMA_BUF_SYNC_END = (1 << 2),
>>>
>>>         DMA_BUF_SYNC_VALID_FLAGS_MASK = DMA_BUF_SYNC_RW |
>>>                 DMA_BUF_SYNC_END
>>> };
>>>
>>>
>>> /* begin/end dma-buf functions used for userspace mmap. */
>>> struct dma_buf_sync {
>>>         enum dma_buf_sync_flags flags;
>>>         __u64 x;
>>>         __u64 width;
>>>
>>>         /* 2D Sync parameters */
>>>         __u64 stride;
>>>         __u64 y;
>>>         __u64 height;
>>>
>>> }
>>>
>>>
>>> So again, why use a 2D sync in the first place when it's probably possible to batch 1D syncs in the driver?
>>> It all boils down to the number of syscalls required, and also the fact that it's probably not trivial to batch sync calls in the read-write case. One can, again, argue that there might be 3D cases or other obscure sync cases,
>>> but in that case, let's implement them in the future if the overhead becomes unbearable. If the sync mechanism gets established as
>>> mandatory, the problem will solve itself. And at least now we cover the most common case and we don't need to add a new multiplexing
>>> mechanism in the sync IOCTL since we already have one in the IOCTL mechanism. We can simply add a new sync IOCTL for new obscure sync cases if desired.
>> I still don't think this ameliorates the need for batching: consider
>> the case where you update two disjoint screen regions and want them
>> both flushed. Either you issue two separate sync calls (which can be
>> disadvantageous performance-wise on some hardware / setups), or you
>> accumulate the regions and only flush later. So either two ioctls (one
>> in the style of dirtyfb and one to perform the sync/flush; you can
>> shortcut to assume the full buffer was damaged if the former is
>> missing), or one like this:
>>
>> struct dma_buf_sync_2d {
>>         enum dma_buf_sync_flags flags;
>>
>>         __u64 stride_bytes;
>>         __u32 bytes_per_pixel;
>>         __u32 num_regions;
>>
>>         struct {
>>                 __u64 x;
>>                 __u64 y;
>>                 __u64 width;
>>                 __u64 height;
>>         } regions[];
>> };
> I really feel like any kind of multi-range flush interface is feature
> bloat, and if we do it then we should only do it later on when there's a
> clear need for it.

IMO all the use-cases so far that wanted to do this have been 2D
updates. and having only a 1D sync will most probably scare people away
from this interface.

> Afaiui dma-buf mmap will mostly be used for up/downloads, which means
> there will be some explicit copy involved somewhere anyway. So similar to
> userptr usage. And for those most often you just slurp in a linear block
> and then let the blitter rearrange it, at least on i915.
>
> Also thus far the consensus was that dma-buf doesn't care/know about
> content layout, adding strides/bytes/whatever does bend this quite a bit.

I think with a 2D interface, the stride only applies to the sync
operation itself and is only a parameter for that sync operation.
Whether we should have multiple regions or not is not a big deal for me,
but I think at least a 2D sync is crucial.

>
> And finally if we concentrate on the simple-case of one byte range we can
> focus the discussion on correctness, which imo is the important bit here
> really.

If we only focus on correctness there are other ways to do this without
a sync: We can DMA based on accessed and dirty pages, but performance
will crawl, so IMO the sync interface is purely for performance...

> -Daniel

Thanks,
Thomas