[Intel-gfx] [PATCH] drm/i915: Support creation of unbound wc user mappings for objects
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Wed Dec 17 04:46:44 PST 2014
Hi,
On 10/23/2014 05:55 PM, Chris Wilson wrote:
> From: Akash Goel <akash.goel at intel.com>
>
> This patch provides support to create write-combining virtual mappings of
> GEM object. It intends to provide the same funtionality of 'mmap_gtt'
> interface without the constraints and contention of a limited aperture
> space, but requires clients handles the linear to tile conversion on their
> own. This is for improving the CPU write operation performance, as with such
> mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar
> to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache
> flush after update from CPU side, when object is passed onto GPU. This
> type of mapping is specially useful in case of sub-region update,
> i.e. when only a portion of the object is to be updated. Using a CPU mmap
> in such cases would normally incur a clflush of the whole object, and
> using a GTT mmapping would likely require eviction of an active object or
> fence and thus stall. The write-combining CPU mmap avoids both.
>
> To ensure the cache coherency, before using this mapping, the GTT domain
> has been reused here. This provides the required cache flush if the object
> is in CPU domain or synchronization against the concurrent rendering.
> Although the access through an uncached mmap should automatically
> invalidate the cache lines, this may not be true for non-temporal write
> instructions and also not all pages of the object may be updated at any
> given point of time through this mapping. Having a call to get_pages in
> set_to_gtt_domain function, as added in the earlier patch 'drm/i915:
> Broaden application of set-domain(GTT)', would guarantee the clflush and
> so there will be no cachelines holding the data for the object before it
> is accessed through this map.
>
> The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
> extended with a new flags field (defaulting to 0 for existent users). In
> order for userspace to detect the extended ioctl, a new parameter
> I915_PARAM_HAS_EXT_MMAP has been added for versioning the ioctl interface.
>
> v2: Fix error handling, invalid flag detection, renaming (ickle)
>
> The new mmapping is exercised by igt/gem_mmap_wc,
> igt/gem_concurrent_blit and igt/gem_gtt_speed.
>
> Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
> Signed-off-by: Akash Goel <akash.goel at intel.com>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> ---
> drivers/gpu/drm/i915/i915_dma.c | 3 +++
> drivers/gpu/drm/i915/i915_gem.c | 19 +++++++++++++++++++
> include/uapi/drm/i915_drm.h | 9 +++++++++
> 3 files changed, 31 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 1b398070b230..496fb72e25c8 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1027,6 +1027,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
> case I915_PARAM_CMD_PARSER_VERSION:
> value = i915_cmd_parser_get_version();
> break;
> + case I915_PARAM_MMAP_VERSION:
> + value = 1;
> + break;
> default:
> DRM_DEBUG("Unknown parameter %d\n", param->param);
> return -EINVAL;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 895f9881f0aa..7d7265fcee95 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1485,6 +1485,12 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
> struct drm_gem_object *obj;
> unsigned long addr;
>
> + if (args->flags & ~(I915_MMAP_WC))
> + return -EINVAL;
> +
> + if (args->flags & I915_MMAP_WC && !cpu_has_pat)
> + return -ENODEV;
> +
> obj = drm_gem_object_lookup(dev, file, args->handle);
> if (obj == NULL)
> return -ENOENT;
> @@ -1500,6 +1506,19 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
> addr = vm_mmap(obj->filp, 0, args->size,
> PROT_READ | PROT_WRITE, MAP_SHARED,
> args->offset);
> + if (args->flags & I915_MMAP_WC) {
> + struct mm_struct *mm = current->mm;
> + struct vm_area_struct *vma;
> +
> + down_write(&mm->mmap_sem);
> + vma = find_vma(mm, addr);
> + if (vma)
> + vma->vm_page_prot =
> + pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
> + else
> + addr = -ENOMEM;
> + up_write(&mm->mmap_sem);
> + }
> drm_gem_object_unreference_unlocked(obj);
> if (IS_ERR((void *)addr))
> return addr;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index ff57f07c3249..9b3762724a2b 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -340,6 +340,7 @@ typedef struct drm_i915_irq_wait {
> #define I915_PARAM_HAS_EXEC_HANDLE_LUT 26
> #define I915_PARAM_HAS_WT 27
> #define I915_PARAM_CMD_PARSER_VERSION 28
> +#define I915_PARAM_MMAP_VERSION 29
>
> typedef struct drm_i915_getparam {
> int param;
> @@ -487,6 +488,14 @@ struct drm_i915_gem_mmap {
> * This is a fixed-size type for 32/64 compatibility.
> */
> __u64 addr_ptr;
> +
> + /**
> + * Flags for extended behaviour.
> + *
> + * Added in version 2.
> + */
> + __u64 flags;
> +#define I915_MMAP_WC 0x1
> };
>
> struct drm_i915_gem_mmap_gtt {
I had to spend some time reading about stuff but eventually this looks
good to me. So:
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
Just one thing to clarify - the same write combining speed up effect is
already possible using the mentioned non-temporal instructions, this
just makes it easier to use for a wider group of users?
Regards,
Tvrtko
More information about the Intel-gfx
mailing list