[PATCH v6 3/5] dma-buf: Add ioctls to allow userspace to flush
Daniel Vetter
daniel at ffwll.ch
Fri Dec 18 07:29:58 PST 2015
On Thu, Dec 17, 2015 at 10:58:20PM +0100, Thomas Hellstrom wrote:
> On 12/16/2015 11:25 PM, Tiago Vignatti wrote:
> > From: Daniel Vetter <daniel.vetter at ffwll.ch>
> >
> > The userspace might need some sort of cache coherency management e.g. when CPU
> > and GPU domains are being accessed through dma-buf at the same time. To
> > circumvent this problem there are begin/end coherency markers, that forward
> > directly to existing dma-buf device drivers vfunc hooks. Userspace can make use
> > of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be
> > used like following:
> > - mmap dma-buf fd
> > - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
> > to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
> > want (with the new data being consumed by the GPU or say scanout device)
> > - munmap once you don't need the buffer any more
> >
> > v2 (Tiago): Fix header file type names (u64 -> __u64)
> > v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end
> > dma-buf functions. Check for overflows in start/length.
> > v4 (Tiago): use 2d regions for sync.
> > v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and
> > remove range information from struct dma_buf_sync.
> > v6 (Tiago): use __u64 structured padded flags instead enum. Adjust
> > documentation about the recommendation on using sync ioctls.
> >
> > Cc: Sumit Semwal <sumit.semwal at linaro.org>
> > Signed-off-by: Daniel Vetter <daniel.vetter at intel.com>
> > Signed-off-by: Tiago Vignatti <tiago.vignatti at intel.com>
> > ---
> > Documentation/dma-buf-sharing.txt | 22 +++++++++++++++++++-
> > drivers/dma-buf/dma-buf.c | 43 +++++++++++++++++++++++++++++++++++++++
> > include/uapi/linux/dma-buf.h | 38 ++++++++++++++++++++++++++++++++++
> > 3 files changed, 102 insertions(+), 1 deletion(-)
> > create mode 100644 include/uapi/linux/dma-buf.h
> >
> > diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt
> > index 4f4a84b..2ddd4b2 100644
> > --- a/Documentation/dma-buf-sharing.txt
> > +++ b/Documentation/dma-buf-sharing.txt
> > @@ -350,7 +350,27 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases:
> > handles, too). So it's beneficial to support this in a similar fashion on
> > dma-buf to have a good transition path for existing Android userspace.
> >
> > - No special interfaces, userspace simply calls mmap on the dma-buf fd.
> > + No special interfaces, userspace simply calls mmap on the dma-buf fd. Very
> > + important to note though is that, even if it is not mandatory, the userspace
> > + is strongly recommended to always use the cache synchronization ioctl
> > + (DMA_BUF_IOCTL_SYNC) discussed next.
> > +
> > + Some systems might need some sort of cache coherency management e.g. when
> > + CPU and GPU domains are being accessed through dma-buf at the same time. To
> > + circumvent this problem there are begin/end coherency markers, that forward
> > + directly to existing dma-buf device drivers vfunc hooks. Userspace can make
> > + use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence
> > + would be used like following:
> > + - mmap dma-buf fd
> > + - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
> > + to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
> > + want (with the new data being consumed by the GPU or say scanout device)
> > + - munmap once you don't need the buffer any more
> > +
> > + In principle systems with the memory cache shared by the GPU and CPU may
> > + not need SYNC_START and SYNC_END but still, userspace is always encouraged
> > + to use these ioctls before and after, respectively, when accessing the
> > + mapped address.
> >
>
> I think the wording here is far too weak. If this is a generic
> user-space interface and syncing
> is required for
> a) Correctness: then syncing must be mandatory.
> b) Optimal performance then an implementation must generate expected
> results also in the absence of SYNC ioctls, but is allowed to rely on
> correct pairing of SYNC_START and SYNC_END to render correctly.
Yeah I guess the wroking should be stronger to make it clear you better
call these, always.
> Also recalling the discussion of multidimensional sync, which we said we
> would add when it was needed, my worst nightmare is when (not if) there
> are a number of important applications starting to abuse this interface
> for small writes or reads to large DMA buffers. It will work flawlessly
> on coherent architectures, and probably some incoherent architectures as
> well, but will probably be mostly useless on VMware. What is the plan
> for adding 2D sync to make it work? How do we pursuade app developers to
> think of damage regions, and can we count on support from the DRM
> community when this happens?
The current use-case in cros seems to be to do full-blown buffer uploads,
nothing partial. I don't think there's anything we can do really (flushing
for small uploads will make i915 crawl too, albeit you can still see a
slideshow and it's not a complete stop), except maybe improve the docs
with a statement that trying to use dma-buf mmap for small uploads will be
result in horrible performance. I think inevitable reality is that some
truly horrible software will always ship. I am pretty hopeful though that
this won't be too common, since the main users for anything dma-buf are on
mobile, and those systems (at least on i915) are all incoherent. So as
long as it'll run acceptably on i915 I think it should run at least ok-ish
on vmwgfx too.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the dri-devel
mailing list