about mmap dma-buf and sync

Tue Aug 25 02:02:25 PDT 2015

On Mon, Aug 24, 2015 at 04:52:13PM +0100, Daniel Stone wrote:
> Hi Thomas,
> 
> On 24 August 2015 at 10:50, Thomas Hellstrom <thomas at shipmail.org> wrote:
> > In any case, Ideally I'd want the struct dma_buf_sync look something like
> >
> > enum dma_buf_sync_flags {
> >         DMA_BUF_SYNC_READ = (1 << 0),
> >         DMA_BUF_SYNC_WRITE = (2 << 0),
> >         DMA_BUF_SYNC_RW = (3 << 0),
> >         DMA_BUF_SYNC_START = (0 << 2),
> >         DMA_BUF_SYNC_END = (1 << 2),
> >
> >         DMA_BUF_SYNC_VALID_FLAGS_MASK = DMA_BUF_SYNC_RW |
> >                 DMA_BUF_SYNC_END
> > };
> >
> >
> > /* begin/end dma-buf functions used for userspace mmap. */
> > struct dma_buf_sync {
> >         enum dma_buf_sync_flags flags;
> >         __u64 x;
> >         __u64 width;
> >
> >         /* 2D Sync parameters */
> >         __u64 stride;
> >         __u64 y;
> >         __u64 height;
> >
> > }
> >
> >
> > So again, why use a 2D sync in the first place when it's probably possible to batch 1D syncs in the driver?
> > It all boils down to the number of syscalls required, and also the fact that it's probably not trivial to batch sync calls in the read-write case. One can, again, argue that there might be 3D cases or other obscure sync cases,
> > but in that case, let's implement them in the future if the overhead becomes unbearable. If the sync mechanism gets established as
> > mandatory, the problem will solve itself. And at least now we cover the most common case and we don't need to add a new multiplexing
> > mechanism in the sync IOCTL since we already have one in the IOCTL mechanism. We can simply add a new sync IOCTL for new obscure sync cases if desired.
> 
> I still don't think this ameliorates the need for batching: consider
> the case where you update two disjoint screen regions and want them
> both flushed. Either you issue two separate sync calls (which can be
> disadvantageous performance-wise on some hardware / setups), or you
> accumulate the regions and only flush later. So either two ioctls (one
> in the style of dirtyfb and one to perform the sync/flush; you can
> shortcut to assume the full buffer was damaged if the former is
> missing), or one like this:
> 
> struct dma_buf_sync_2d {
>         enum dma_buf_sync_flags flags;
> 
>         __u64 stride_bytes;
>         __u32 bytes_per_pixel;
>         __u32 num_regions;
> 
>         struct {
>                 __u64 x;
>                 __u64 y;
>                 __u64 width;
>                 __u64 height;
>         } regions[];
> };

I really feel like any kind of multi-range flush interface is feature
bloat, and if we do it then we should only do it later on when there's a
clear need for it.

Afaiui dma-buf mmap will mostly be used for up/downloads, which means
there will be some explicit copy involved somewhere anyway. So similar to
userptr usage. And for those most often you just slurp in a linear block
and then let the blitter rearrange it, at least on i915.

Also thus far the consensus was that dma-buf doesn't care/know about
content layout, adding strides/bytes/whatever does bend this quite a bit.

And finally if we concentrate on the simple-case of one byte range we can
focus the discussion on correctness, which imo is the important bit here
really.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch