[Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

Daniel Vetter daniel at ffwll.ch
Wed Jan 29 21:25:51 CET 2014


On Tue, Jan 28, 2014 at 01:16:46PM +0000, Chris Wilson wrote:
> By exporting the ability to map user address and inserting PTEs
> representing their backing pages into the GTT, we can exploit UMA in order
> to utilize normal application data as a texture source or even as a
> render target (depending upon the capabilities of the chipset). This has
> a number of uses, with zero-copy downloads to the GPU and efficient
> readback making the intermixed streaming of CPU and GPU operations
> fairly efficient. This ability has many widespread implications from
> faster rendering of client-side software rasterisers (chromium),
> mitigation of stalls due to read back (firefox) and to faster pipelining
> of texture data (such as pixel buffer objects in GL or data blobs in CL).
> 
> v2: Compile with CONFIG_MMU_NOTIFIER
> v3: We can sleep while performing invalidate-range, which we can utilise
> to drop our page references prior to the kernel manipulating the vma
> (for either discard or cloning) and so protect normal users.
> v4: Only run the invalidate notifier if the range intercepts the bo.
> v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
> v6: Recheck after reacquire mutex for lost mmu.
> v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary.
> v8: Fix rebasing error after forwarding porting the back port.
> v9: Limit the userptr to page aligned entries. We now expect userspace
>     to handle all the offset-in-page adjustments itself.
> v10: Prevent vma from being copied across fork to avoid issues with cow.
> v11: Drop vma behaviour changes -- locking is nigh on impossible.
>      Use a worker to load user pages to avoid lock inversions.
> v12: Use get_task_mm()/mmput() for correct refcounting of mm.
> v13: Use a worker to release the mmu_notifier to avoid lock inversion
> v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer
>      with its own locking and tree of objects for each mm/mmu_notifier.
> v15: Prevent overlapping userptr objects, and invalidate all objects
>      within the mmu_notifier range
> v16: Fix a typo for iterating over multiple objects in the range and
>      rearrange error path to destroy the mmu_notifier locklessly.
>      Also close a race between invalidate_range and the get_pages_worker.
> v17: Close a race between get_pages_worker/invalidate_range and fresh
>      allocations of the same userptr range - and notice that
>      struct_mutex was presumed to be held when during creation it wasn't.
> v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory
>      for the struct sg_table and to clear it before reporting an error.
> 
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
> Cc: "Gong, Zhipeng" <zhipeng.gong at intel.com>
> Cc: Akash Goel <akash.goel at intel.com>
> Cc: "Volkin, Bradley D" <bradley.d.volkin at intel.com>

[snip]

> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 37c8073a8246..6c145a0be250 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -224,6 +224,7 @@ typedef struct _drm_i915_sarea {
>  #define DRM_I915_REG_READ		0x31
>  #define DRM_I915_GET_RESET_STATS	0x32
>  #define DRM_I915_GEM_CREATE2		0x33
> +#define DRM_I915_GEM_USERPTR		0x34
>  
>  #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
>  #define DRM_IOCTL_I915_FLUSH		DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH)
> @@ -275,6 +276,7 @@ typedef struct _drm_i915_sarea {
>  #define DRM_IOCTL_I915_GEM_CONTEXT_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_DESTROY, struct drm_i915_gem_context_destroy)
>  #define DRM_IOCTL_I915_REG_READ			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_REG_READ, struct drm_i915_reg_read)
>  #define DRM_IOCTL_I915_GET_RESET_STATS		DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GET_RESET_STATS, struct drm_i915_reset_stats)
> +#define DRM_IOCTL_I915_GEM_USERPTR			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_USERPTR, struct drm_i915_gem_userptr)
>  
>  /* Allow drivers to submit batchbuffers directly to hardware, relying
>   * on the security mechanisms provided by hardware.
> @@ -1129,4 +1131,18 @@ struct drm_i915_reset_stats {
>  	__u32 pad;
>  };
>  
> +struct drm_i915_gem_userptr {
> +	__u64 user_ptr;
> +	__u64 user_size;
> +	__u32 flags;
> +#define I915_USERPTR_READ_ONLY 0x1
> +#define I915_USERPTR_UNSYNCHRONIZED 0x80000000

So originally I've thought we need this due to the massive overhead of the
mmu notifier. But now with the nice shared mmu notifiers I've thought that
overhead is gone I prefer to also ditch this option.

Same goes about the MMU_NOTIFIER conditional code, imo we simply should
select this - most distros will have it anyway and users will be really
suprised if they lose userspace driver features for seemingly irrelevant
reasons.

Beside this I think I've run out of stuff to complain about ;-)

Cheers, Daniel

> +	/**
> +	 * Returned handle for the object.
> +	 *
> +	 * Object handles are nonzero.
> +	 */
> +	__u32 handle;
> +};
> +
>  #endif /* _UAPI_I915_DRM_H_ */
> -- 
> 1.8.5.3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch



More information about the Intel-gfx mailing list