[Intel-gfx] [PATCH] drm/i915: Support to create uncached user mapping for a Gem object

Chris Wilson chris at chris-wilson.co.uk
Thu Oct 23 12:54:19 CEST 2014


On Thu, Oct 23, 2014 at 04:03:56PM +0530, akash.goel at intel.com wrote:
> From: Akash Goel <akash.goel at intel.com>
> 
> This patch provides support to create uncached virtual mappings for a Gem
> object. It intends to provide the same funtionality of 'mmap_gtt' interface
> without the constraints of a limited aperture space, but provided clients
> handles the linear to tile conversion on their own.
> This is for improving the CPU write operation performance, as with such
> mapping, writes are almost 50% faster than with mmap_gtt.

That seems like it should be easy to demonstrate with igt/gem_gtt_speed.
You should also copy igt/gem_mmap_gtt to test the interface, and extend
igt/gem_concurrent_blt to make sure that mmap(wc)+domain vs the GPU is
correct.

> Also it avoids the
> Cache flush after update from CPU side, when object is passed onto GPU, which
> will be the case if regular mmap ioctl interface is used.
                             ^CPU

> This type of mapping is specially useful in case of sub-region update,
> i.e. when only a portion of the object is to be updated.
> To ensure the cache coherency, before using this mapping, the GTT domain has
> been reused here. This provides the required Cache flush if the object is in
> CPU domain or synchronization against the concurrent rendering. Although the
> access through an uncached mmap shall automatically invalidate the cache lines,
> but this may not be true for non temporal write instructions and also not all
> pages of the object be updated at any given point of time through this mapping.
> Having a call to get_pages in set_to_gtt_domain function, as added by Chris in
> the earlier patch, would guarantee the clflush and so there will be no cache-
> lines holding the data for the object before it is accessed through this map.
> A new field 'flags' has been added to 'drm_i915_gem_mmap' structure, used in
> gem_mmap ioctl, which allows to convey the required mapping type as uncached.
> User can query the driver for the support of this mapping through the
> get_params. For that a new define I915_PARAM_HAS_UC_MMAP has been added.

I quibble over using the term uncached here, as what we mean is
write-combining.  They are quite different in terms of memory access!

I've put together a usecase in
http://cgit.freedesktop.org/~ickle/xf86-video-intel/commit/?h=wc-mmap&id=118e98a2d448469064cdcedf2013a91c3a69dab4
which I'll test shortly. It uses the new mmapping to allow direct access
to larger than aperture buffers, prefers the unbound access for linear
buffers and allows manual detiled uploads on !llc platforms.

> Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
> Signed-off-by: Akash Goel <akash.goel at intel.com>
> ---
>  drivers/gpu/drm/i915/i915_dma.c |  3 +++
>  drivers/gpu/drm/i915/i915_gem.c | 17 +++++++++++++++++
>  include/uapi/drm/i915_drm.h     |  4 ++++
>  3 files changed, 24 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 1b39807..2d8191a 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1027,6 +1027,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
>  	case I915_PARAM_CMD_PARSER_VERSION:
>  		value = i915_cmd_parser_get_version();
>  		break;
> +	case I915_PARAM_HAS_UC_MMAP:
> +		value = 1;
> +		break;
>  	default:
>  		DRM_DEBUG("Unknown parameter %d\n", param->param);
>  		return -EINVAL;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 1b192d4..16b267b 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1492,6 +1492,23 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
>  	addr = vm_mmap(obj->filp, 0, args->size,
>  		       PROT_READ | PROT_WRITE, MAP_SHARED,
>  		       args->offset);
> +
> +	if (args->flags & I915_GEM_USE_UNCACHED_MMAP) {
> +		struct mm_struct *mm = current->mm;
> +		struct vm_area_struct *vma;

Newline please between variables and code.

> +		down_write(&mm->mmap_sem);
> +		vma = find_vma(mm, addr);
> +		if (!vma) {
> +			drm_gem_object_unreference_unlocked(obj);
> +			return -EINVAL;
Eek returning with mmap_sem held.

> +		}
> +		/* Change the page attribute to uncached (along with
> +		 * write-combinning to get better performance) */
> +		vma->vm_page_prot =
> +			pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
> +		up_write(&mm->mmap_sem);

Perhaps,

down_write();
vma = find_vma(mm, addr);
if (vma && cpu_has_pat) 
	vma->vm_page_prot = pgprot_writecombing();
else
	addr = ERR_PTR(-ENODEV);
up_write();

I am pretty sure we can't do this trick on really old CPUs!
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre



More information about the Intel-gfx mailing list