Direct userspace dma-buf mmap (v6)

Daniel Vetter daniel at ffwll.ch
Thu Dec 17 02:15:37 PST 2015


On Wed, Dec 16, 2015 at 08:25:32PM -0200, Tiago Vignatti wrote:
> Hi all,
> 
> The last version of this work was sent a while ago here:
> 
> http://lists.freedesktop.org/archives/dri-devel/2015-August/089263.html
> 
> So let's recap this series:
> 
>     1. it adds a vendor-independent client interface for mapping gem objects
>        through prime, IOW it implements userspace mmap() on dma-buf fd.
>        This could be used for texturing from CPU rendered buffer, passing
>        buffers among processes without performing copies in the userspace.
>     2. the series lets the client write on the mmap'ed memory, and
>     3. it deals with GPU and CPU caches synchronization.
> 
> Based on previous discussions seems that people are fine with 1. and 2. but 
> not really with 3., given that caches coherency is a bit more boring to deal 
> with.
> 
> It's easier to use this new infra on "coherent hardware" (systems with the
> memory cache that is shared by the GPU and CPU) because they rarely need to
> use that kind of synchronization. But would be much more convenient to have 
> the very same interface exposed for clients no matter whether the underlying 
> hardware is cache coherent or not.
> 
> One idea that came up was to force clients to call the sync ioctls after the
> dma-buf was mmaped. But apparently there's no easy, and performant, way to do
> so cause seems too costly to go over the page table entry and check the dirty
> bits. Also, depending on the instructions order sent for the devices, it
> might be needed a sync call after the mapped region gets accessed as well, to
> flush all cachelines and make sure for example the GPU domain won't read stale 
> data. So that would make the things even more complicated, if we ever decide
> to go to this direction of forcing sync ioctls. The alternative therefore is to
> simply document it very well, strong wording the clients to use the sync ioctl 
> regardless otherwise they will mis-behave. Do we have objections or maybe 
> other wiser ways to circumvent this? I've made similar comments in August and
> no one has came up with better ideas.

I still think this is as good as it'll get. We can't force userspace to
behave without a serious perf hit, and without enforcing it all the time
there's not much use in it. Also there's the problem that mmap interfaces
in the kernel don't really allow you (at least easily) to intercept mmap
access.

It might make sense later on as a debug feature in case you do have a
coherency bug and want to know who screwed up. Similar to what exists for
the dma api.

Quickly looked through the patches and looks really nice. Especially the
test coverage (including frontbuffer coherency checks for i915) is
awesome. Imo as soon as we have an ack from chromium upstream that they're
ok with this approach and your chromium patches, and after detailed code
review is done this can go in. An ack from Thomas Hellstrom on the
simplified coherency management interface would be good too.

Thanks, Daniel

> Lastly, the diff of v6 series is that I've basically addressed concerns
> pointed in the igt tests, organized those changes better a bit (in smaller
> patches), documented the usage of sync ioctls and I have extensively tested 
> this in different types of hardware.
> 
> https://github.com/tiagovignatti/drm-intel/commits/drm-intel-nightly_dma-buf-mmap-v6
> https://github.com/tiagovignatti/intel-gpu-tools/commits/dma-buf-mmap-v6
> 
> Tiago
> 
> 
> Daniel Thompson (1):
>   drm: prime: Honour O_RDWR during prime-handle-to-fd
> 
> Daniel Vetter (1):
>   dma-buf: Add ioctls to allow userspace to flush
> 
> Tiago Vignatti (3):
>   dma-buf: Remove range-based flush
>   drm/i915: Implement end_cpu_access
>   drm/i915: Use CPU mapping for userspace dma-buf mmap()
> 
>  Documentation/dma-buf-sharing.txt         | 41 +++++++++++++++-------
>  drivers/dma-buf/dma-buf.c                 | 56 ++++++++++++++++++++++++++-----
>  drivers/gpu/drm/drm_prime.c               | 10 ++----
>  drivers/gpu/drm/i915/i915_gem_dmabuf.c    | 42 +++++++++++++++++++++--
>  drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c |  4 +--
>  drivers/gpu/drm/udl/udl_fb.c              |  2 --
>  drivers/staging/android/ion/ion.c         |  6 ++--
>  drivers/staging/android/ion/ion_test.c    |  4 +--
>  include/linux/dma-buf.h                   | 12 +++----
>  include/uapi/drm/drm.h                    |  1 +
>  include/uapi/linux/dma-buf.h              | 38 +++++++++++++++++++++
>  11 files changed, 169 insertions(+), 47 deletions(-)
>  create mode 100644 include/uapi/linux/dma-buf.h
> 
> 
> And the igt changes:
> Rob Bradford (1):
>   prime_mmap: Add new test for calling mmap() on dma-buf fds
> 
> Tiago Vignatti (5):
>   lib: Add gem_userptr and __gem_userptr helpers
>   prime_mmap: Add basic tests to write in a bo using CPU
>   lib: Add prime_sync_start and prime_sync_end helpers
>   tests: Add kms_mmap_write_crc for cache coherency tests
>   tests: Add prime_mmap_coherency for cache coherency tests
> 
>  benchmarks/gem_userptr_benchmark.c |  55 +----
>  lib/ioctl_wrappers.c               |  92 +++++++
>  lib/ioctl_wrappers.h               |  32 +++
>  tests/Makefile.sources             |   3 +
>  tests/gem_userptr_blits.c          | 104 ++------
>  tests/kms_mmap_write_crc.c         | 281 +++++++++++++++++++++
>  tests/prime_mmap.c                 | 494 +++++++++++++++++++++++++++++++++++++
>  tests/prime_mmap_coherency.c       | 246 ++++++++++++++++++
>  8 files changed, 1180 insertions(+), 127 deletions(-)
>  create mode 100644 tests/kms_mmap_write_crc.c
>  create mode 100644 tests/prime_mmap.c
>  create mode 100644 tests/prime_mmap_coherency.c
> 
> -- 
> 2.1.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the dri-devel mailing list