[Intel-gfx] [RFC 2/2] drm/i915: Moved the cache flush outside the 'struct_mutex' lock
Chris Wilson
chris at chris-wilson.co.uk
Thu Aug 6 05:06:53 PDT 2015
On Sun, May 04, 2014 at 04:48:25PM +0530, akash.goel at intel.com wrote:
> From: Akash Goel <akash.goel at intel.com>
>
> Moved the cache flush of the preallocated shmem pages outside
> the span of 'struct_mutex' lock. This shall not lead to any
> redundancy as the cache flush of the newly allocated pages
> will be done anyways when same buffer is submitted to GPU or
> when the domain of the object is changed from CPU to GTT
> in Gem page fault handler.
>
> Signed-off-by: Akash Goel <akash.goel at intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index b19ccb8..7ab2635 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1936,8 +1936,11 @@ i915_gem_object_shmem_preallocate(struct drm_i915_gem_object *obj)
> if (IS_ERR(page)) {
> DRM_DEBUG_DRIVER("Failure for obj(%p), size(%x) at page(%d)\n",
> obj, obj->base.size, i);
> - break;
> + return;
> }
> + /* Flush the cpu cache for the page now itself */
> + drm_clflush_pages(&page, 1);
> +
> /* Decrement the extra ref count on the returned page,
> otherwise when 'get_pages_gtt' will be called later on
> in the regular path, it will also increment the ref count,
> @@ -1945,6 +1948,14 @@ i915_gem_object_shmem_preallocate(struct drm_i915_gem_object *obj)
> page_cache_release(page);
> }
>
> + /*
> + * Reset the CPU domain now itself, so as to avoid the cache
> + * flush later (under 'struct_mutex' lock), as the all pages
> + * have been cache flushed.
> + * Hope this is safe enough to be done here.
> + */
> + obj->base.write_domain = 0;
So cache-domain-management is under the struct_mutex, for good reason.
However, if we follow the idea to move this to create2, then adding a
flag to also do the clflush upon allocation seems fine. Note that I
don't think doing it by default is a good match for everything, since my
code and also the kernel already takes advantage of the fact that fresh
buffers are in the CPU cache domain and uses that to do any initial
transfers in the CPU domain.
So we have I915_CREATE_POPULATE, I915_CREATE_COHERENT ?
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list