[Intel-gfx] [PATCH] drm/i915: Support creation of unbound wc user mappings for objects

Wed Dec 17 04:46:44 PST 2014

Hi,

On 10/23/2014 05:55 PM, Chris Wilson wrote:
> From: Akash Goel <akash.goel at intel.com>
>
> This patch provides support to create write-combining virtual mappings of
> GEM object. It intends to provide the same funtionality of 'mmap_gtt'
> interface without the constraints and contention of a limited aperture
> space, but requires clients handles the linear to tile conversion on their
> own. This is for improving the CPU write operation performance, as with such
> mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar
> to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache
> flush after update from CPU side, when object is passed onto GPU.  This
> type of mapping is specially useful in case of sub-region update,
> i.e. when only a portion of the object is to be updated. Using a CPU mmap
> in such cases would normally incur a clflush of the whole object, and
> using a GTT mmapping would likely require eviction of an active object or
> fence and thus stall. The write-combining CPU mmap avoids both.
>
> To ensure the cache coherency, before using this mapping, the GTT domain
> has been reused here. This provides the required cache flush if the object
> is in CPU domain or synchronization against the concurrent rendering.
> Although the access through an uncached mmap should automatically
> invalidate the cache lines, this may not be true for non-temporal write
> instructions and also not all pages of the object may be updated at any
> given point of time through this mapping.  Having a call to get_pages in
> set_to_gtt_domain function, as added in the earlier patch 'drm/i915:
> Broaden application of set-domain(GTT)', would guarantee the clflush and
> so there will be no cachelines holding the data for the object before it
> is accessed through this map.
>
> The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
> extended with a new flags field (defaulting to 0 for existent users). In
> order for userspace to detect the extended ioctl, a new parameter
> I915_PARAM_HAS_EXT_MMAP has been added for versioning the ioctl interface.
>
> v2: Fix error handling, invalid flag detection, renaming (ickle)
>
> The new mmapping is exercised by igt/gem_mmap_wc,
> igt/gem_concurrent_blit and igt/gem_gtt_speed.
>
> Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
> Signed-off-by: Akash Goel <akash.goel at intel.com>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> ---
>   drivers/gpu/drm/i915/i915_dma.c |  3 +++
>   drivers/gpu/drm/i915/i915_gem.c | 19 +++++++++++++++++++
>   include/uapi/drm/i915_drm.h     |  9 +++++++++
>   3 files changed, 31 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 1b398070b230..496fb72e25c8 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1027,6 +1027,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
>   	case I915_PARAM_CMD_PARSER_VERSION:
>   		value = i915_cmd_parser_get_version();
>   		break;
> +	case I915_PARAM_MMAP_VERSION:
> +		value = 1;
> +		break;
>   	default:
>   		DRM_DEBUG("Unknown parameter %d\n", param->param);
>   		return -EINVAL;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 895f9881f0aa..7d7265fcee95 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1485,6 +1485,12 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
>   	struct drm_gem_object *obj;
>   	unsigned long addr;
>
> +	if (args->flags & ~(I915_MMAP_WC))
> +		return -EINVAL;
> +
> +	if (args->flags & I915_MMAP_WC && !cpu_has_pat)
> +		return -ENODEV;
> +
>   	obj = drm_gem_object_lookup(dev, file, args->handle);
>   	if (obj == NULL)
>   		return -ENOENT;
> @@ -1500,6 +1506,19 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
>   	addr = vm_mmap(obj->filp, 0, args->size,
>   		       PROT_READ | PROT_WRITE, MAP_SHARED,
>   		       args->offset);
> +	if (args->flags & I915_MMAP_WC) {
> +		struct mm_struct *mm = current->mm;
> +		struct vm_area_struct *vma;
> +
> +		down_write(&mm->mmap_sem);
> +		vma = find_vma(mm, addr);
> +		if (vma)
> +			vma->vm_page_prot =
> +				pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
> +		else
> +			addr = -ENOMEM;
> +		up_write(&mm->mmap_sem);
> +	}
>   	drm_gem_object_unreference_unlocked(obj);
>   	if (IS_ERR((void *)addr))
>   		return addr;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index ff57f07c3249..9b3762724a2b 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -340,6 +340,7 @@ typedef struct drm_i915_irq_wait {
>   #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
>   #define I915_PARAM_HAS_WT     	 	 27
>   #define I915_PARAM_CMD_PARSER_VERSION	 28
> +#define I915_PARAM_MMAP_VERSION          29
>
>   typedef struct drm_i915_getparam {
>   	int param;
> @@ -487,6 +488,14 @@ struct drm_i915_gem_mmap {
>   	 * This is a fixed-size type for 32/64 compatibility.
>   	 */
>   	__u64 addr_ptr;
> +
> +	/**
> +	 * Flags for extended behaviour.
> +	 *
> +	 * Added in version 2.
> +	 */
> +	__u64 flags;
> +#define I915_MMAP_WC 0x1
>   };
>
>   struct drm_i915_gem_mmap_gtt {

I had to spend some time reading about stuff but eventually this looks 
good to me. So:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Just one thing to clarify - the same write combining speed up effect is 
already possible using the mentioned non-temporal instructions, this 
just makes it easier to use for a wider group of users?

Regards,

Tvrtko