[Intel-gfx] [PATCH] drm/i915: Use the LLC mode on gen6 for everything but display.

Chris Wilson chris at chris-wilson.co.uk
Fri Mar 11 22:07:10 CET 2011


On Fri, 11 Mar 2011 12:11:20 -0800, Eric Anholt <eric at anholt.net> wrote:
> This provided a 10.4% +/- 1.5% (n=3) performance improvement on
> openarena on my laptop.  We have more room to improve with doing LLC
> caching for display using GFDT, and in doing LLC+MLC caching, but this
> was an easy performance win and incremental improvement toward those
> two.
> 
> Signed-off-by: Eric Anholt <eric at anholt.net>
> ---
> 
> Just wrote this and tested it this morning.  Testing was rough, as
> just starting with 2.6.38-rc8+patfix, I had a bunch of:
> 
> [  156.985125] [drm:drm_crtc_helper_set_config] *ERROR* failed to set mode on [CRTC:3]
> 
> and thus broken display, though I'm on a preproduction machine still
> so I'm less worried than I would otherwise be.

Signal arrived at an inopportune time, typically because we needed to
evict in order to pin the scanout. The fix is either to thread the
interruptible status everywhere, make drm modesetting interruptible or to
flag the device as uninterruptible. The latter was much easier.

> Can anyone think of anything but display and cursors that would be
> incoherent with LLC?

I'm only aware of the display engine needing special treatment.

I did this patch in a few stages. First to introduce rebinding, so that we
could modify the PTEs in place. Then rebind to uncached in
move_to_display_plane. However, that loses out on your is pinned safety
check, and that is check is a good thing. And then change the default
agp_types. I was also looking at exposing the cacheability (and changing
the name to be more descriptive of how it is used by GEM) to userspace
(and so using rebind for more then just the display plane transition)
and as part of the vmap series.

A major side-effect of pre-fetch is that we cannot have different memory
types adjacent in the GATT, but we must engineer fire-breaks between
snooped/unsnooped memory. (Only really an issue with earlier gen, but
something a more general solution has to cope with.)

> 
>  drivers/gpu/drm/i915/i915_gem.c      |   18 +++++++++++-
>  drivers/gpu/drm/i915/intel_display.c |   51 ++++++++++++++++++++++++++++++++++
>  2 files changed, 68 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 36e66cc..f4f6de3 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3539,7 +3539,23 @@ struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
>  	obj->base.write_domain = I915_GEM_DOMAIN_CPU;
>  	obj->base.read_domains = I915_GEM_DOMAIN_CPU;
>  
> -	obj->agp_type = AGP_USER_MEMORY;
> +	if (IS_GEN6(dev)) {
> +		/* On Gen6, we can have the GPU use the LLC (the CPU
> +		 * cache) for about a 10% performance improvement
> +		 * compared to uncached.  Graphics requests other than
> +		 * display scanout are coherent with the CPU in
> +		 * accessing this cache.  This means in this mode we
> +		 * don't need to clflush on the CPU side, and on the
> +		 * GPU side we only need to flush internal caches to
> +		 * get data visible to the CPU.
> +		 *
> +		 * For display, see intel_pin_and_fence_fb_obj() for
> +		 * how we handle changing the caching.
> +		 */
> +		obj->agp_type = AGP_USER_CACHED_MEMORY;
> +	} else {
> +		obj->agp_type = AGP_USER_MEMORY;
> +	}

I made this a field under INTEL_INFO for flexibility. Probably overkill.

>  	obj->base.driver_private = NULL;
>  	obj->fence_reg = I915_FENCE_REG_NONE;
>  	INIT_LIST_HEAD(&obj->mm_list);
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 49fb54f..0b3096a 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -1456,6 +1456,45 @@ out_disable:
>  	}
>  }
>  
> +/* The display engine is not coherent with the LLC cache on gen6.  As
> + * a result, we make sure that the pinning that is about to occur is
> + * done with uncached PTEs.
> + *
> + * We could do this better in a couple of ways.  The most important
> + * would be to use the GFDT bit instead of uncaching, which would
> + * allow us to flush all the LLC-cached data with that bit in the PTE
> + * to main memory with just one PIPE_CONTROL.  The other would be to
> + * update the PTEs by calling i915_gem_gtt_bind_object() and then
> + * flush any existing CPU cache of the object, instead of unbinding.
> + */
The comment is better suited at the callsite as to why it is needed to be
called.

> +static int
> +i915_set_pte_uncached(struct drm_i915_gem_object *obj)
i915_gem_object_set_agp_type or i915_gem_object_set_cache_level
> +{
> +	int ret;
> +
> +	if (obj->agp_type == AGP_USER_MEMORY)
> +		return 0;
> +
> +	obj->agp_type = AGP_USER_MEMORY;
> +
> +	if (obj->pin_count > 0) {
> +		static int once = 0;
> +		if (!once) {
> +			DRM_ERROR("Trying to change caching on pinned fb\n");
> +			once = 1;
> +		}
> +		return -EBUSY;
if (WARN_ONCE(obj->pin_count, "blah...")) return -EBUSY;

> +	}
> +
> +	ret = i915_gem_object_unbind(obj);
> +	if (ret)
> +		return ret;
> +
> +	obj->agp_type = AGP_USER_MEMORY;
> +
> +	return 0;
> +}
> +
>  int
>  intel_pin_and_fence_fb_obj(struct drm_device *dev,
>  			   struct drm_i915_gem_object *obj,
> @@ -1485,6 +1524,12 @@ intel_pin_and_fence_fb_obj(struct drm_device *dev,
>  		BUG();
>  	}
>  
> +	if (IS_GEN6(dev)) {
> +		ret = i915_set_pte_uncached(obj);
> +		if (ret)
> +			return ret;
> +	}

This can be done unconditionally, since our current design is to use uncached
for all scanout on all generations. The function should be passed the new
agp_type and so we can reuse the function later.

> +
>  	ret = i915_gem_object_pin(obj, alignment, true);
>  	if (ret)
>  		return ret;
> @@ -4740,6 +4785,12 @@ static int intel_crtc_cursor_set(struct drm_crtc *crtc,
>  			goto fail_locked;
>  		}
>  
> +		if (IS_GEN6(dev)) {
> +			ret = i915_set_pte_uncached(obj);
> +			if (ret)
> +				goto fail_locked;
> +		}
> +
>  		ret = i915_gem_object_pin(obj, PAGE_SIZE, true);
>  		if (ret) {
>  			DRM_ERROR("failed to pin cursor bo\n");

-- 
Chris Wilson, Intel Open Source Technology Centre



More information about the Intel-gfx mailing list