[PATCH v6 3/5] dma-buf: Add ioctls to allow userspace to flush

Fri Dec 18 07:29:58 PST 2015

On Thu, Dec 17, 2015 at 10:58:20PM +0100, Thomas Hellstrom wrote:
> On 12/16/2015 11:25 PM, Tiago Vignatti wrote:
> > From: Daniel Vetter <daniel.vetter at ffwll.ch>
> >
> > The userspace might need some sort of cache coherency management e.g. when CPU
> > and GPU domains are being accessed through dma-buf at the same time. To
> > circumvent this problem there are begin/end coherency markers, that forward
> > directly to existing dma-buf device drivers vfunc hooks. Userspace can make use
> > of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be
> > used like following:
> >      - mmap dma-buf fd
> >      - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
> >        to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
> >        want (with the new data being consumed by the GPU or say scanout device)
> >      - munmap once you don't need the buffer any more
> >
> > v2 (Tiago): Fix header file type names (u64 -> __u64)
> > v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end
> > dma-buf functions. Check for overflows in start/length.
> > v4 (Tiago): use 2d regions for sync.
> > v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and
> > remove range information from struct dma_buf_sync.
> > v6 (Tiago): use __u64 structured padded flags instead enum. Adjust
> > documentation about the recommendation on using sync ioctls.
> >
> > Cc: Sumit Semwal <sumit.semwal at linaro.org>
> > Signed-off-by: Daniel Vetter <daniel.vetter at intel.com>
> > Signed-off-by: Tiago Vignatti <tiago.vignatti at intel.com>
> > ---
> >  Documentation/dma-buf-sharing.txt | 22 +++++++++++++++++++-
> >  drivers/dma-buf/dma-buf.c         | 43 +++++++++++++++++++++++++++++++++++++++
> >  include/uapi/linux/dma-buf.h      | 38 ++++++++++++++++++++++++++++++++++
> >  3 files changed, 102 insertions(+), 1 deletion(-)
> >  create mode 100644 include/uapi/linux/dma-buf.h
> >
> > diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt
> > index 4f4a84b..2ddd4b2 100644
> > --- a/Documentation/dma-buf-sharing.txt
> > +++ b/Documentation/dma-buf-sharing.txt
> > @@ -350,7 +350,27 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases:
> >     handles, too). So it's beneficial to support this in a similar fashion on
> >     dma-buf to have a good transition path for existing Android userspace.
> >  
> > -   No special interfaces, userspace simply calls mmap on the dma-buf fd.
> > +   No special interfaces, userspace simply calls mmap on the dma-buf fd. Very
> > +   important to note though is that, even if it is not mandatory, the userspace
> > +   is strongly recommended to always use the cache synchronization ioctl
> > +   (DMA_BUF_IOCTL_SYNC) discussed next.
> > +
> > +   Some systems might need some sort of cache coherency management e.g. when
> > +   CPU and GPU domains are being accessed through dma-buf at the same time. To
> > +   circumvent this problem there are begin/end coherency markers, that forward
> > +   directly to existing dma-buf device drivers vfunc hooks. Userspace can make
> > +   use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence
> > +   would be used like following:
> > +     - mmap dma-buf fd
> > +     - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
> > +       to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
> > +       want (with the new data being consumed by the GPU or say scanout device)
> > +     - munmap once you don't need the buffer any more
> > +
> > +    In principle systems with the memory cache shared by the GPU and CPU may
> > +    not need SYNC_START and SYNC_END but still, userspace is always encouraged
> > +    to use these ioctls before and after, respectively, when accessing the
> > +    mapped address.
> >  
> 
> I think the wording here is far too weak. If this is a generic
> user-space interface and syncing
> is required for
> a) Correctness: then syncing must be mandatory.
> b) Optimal performance then an implementation must generate expected
> results also in the absence of SYNC ioctls, but is allowed to rely on
> correct pairing of SYNC_START and SYNC_END to render correctly.

Yeah I guess the wroking should be stronger to make it clear you better
call these, always.

> Also recalling the discussion of multidimensional sync, which we said we
> would add when it was needed, my worst nightmare is when (not if) there
> are a number of important applications starting to abuse this interface
> for small writes or reads to large DMA buffers. It will work flawlessly
> on coherent architectures, and probably some incoherent architectures as
> well, but will probably  be mostly useless on VMware. What is the plan
> for adding 2D sync to make it work? How do we pursuade app developers to
> think of damage regions, and can we count on support from the DRM
> community when this happens?

The current use-case in cros seems to be to do full-blown buffer uploads,
nothing partial. I don't think there's anything we can do really (flushing
for small uploads will make i915 crawl too, albeit you can still see a
slideshow and it's not a complete stop), except maybe improve the docs
with a statement that trying to use dma-buf mmap for small uploads will be
result in horrible performance. I think inevitable reality is that some
truly horrible software will always ship. I am pretty hopeful though that
this won't be too common, since the main users for anything dma-buf are on
mobile, and those systems (at least on i915) are all incoherent. So as
long as it'll run acceptably on i915 I think it should run at least ok-ish
on vmwgfx too.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch