[Intel-gfx] [PATCH v2 1/3] drm/i915: Fix eviction when the GGTT is idle but full

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Wed Oct 11 12:05:02 UTC 2017


On 10/10/2017 22:38, Chris Wilson wrote:
> In the full-ppgtt world, we can fill the GGTT full of context objects.
> These context objects are currently implicitly tracked by the requests
> that pin them i.e. they are only unpinned when the request is completed
> and retired, but we do not have the link from the vma to the request
> (anymore). In order to unpin those contexts, we have to issue another
> request and wait upon the switch to the kernel context.
> 
> The bug during eviction was that we assumed that a full GGTT meant we
> would have requests on the GGTT timeline, and so we missed situations
> where those requests where merely in flight (and when even they have not
> yet been submitted to hw yet). The fix employed here is to change the
> already-is-idle test to no look at the execution timeline, but count the
> outstanding requests and then check that we have switched to the kernel
> context. Erring on the side of overkill here just means that we stall a
> little longer than may be strictly required, but we only expect to hit
> this path in extreme corner cases where returning an erroneous error is
> worse than the delay.
> 
> v2: Logical inversion when swapping over branches.
> 
> Fixes: 80b204bce8f2 ("drm/i915: Enable multiple timelines")
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_evict.c | 63 ++++++++++++++++++++++-------------
>   1 file changed, 39 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
> index a5a5b7e6daae..ee4811ffb7aa 100644
> --- a/drivers/gpu/drm/i915/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/i915_gem_evict.c
> @@ -33,21 +33,20 @@
>   #include "intel_drv.h"
>   #include "i915_trace.h"
>   
> -static bool ggtt_is_idle(struct drm_i915_private *dev_priv)
> +static bool ggtt_is_idle(struct drm_i915_private *i915)
>   {
> -	struct i915_ggtt *ggtt = &dev_priv->ggtt;
> -	struct intel_engine_cs *engine;
> -	enum intel_engine_id id;
> +       struct intel_engine_cs *engine;
> +       enum intel_engine_id id;
>   
> -	for_each_engine(engine, dev_priv, id) {
> -		struct intel_timeline *tl;
> +       if (i915->gt.active_requests)
> +	       return false;
>   
> -		tl = &ggtt->base.timeline.engine[engine->id];
> -		if (i915_gem_active_isset(&tl->last_request))
> -			return false;
> -	}
> +       for_each_engine(engine, i915, id) {
> +	       if (engine->last_retired_context != i915->kernel_context)
> +		       return false;
> +       }
>   
> -	return true;
> +       return true;
>   }
>   
>   static int ggtt_flush(struct drm_i915_private *i915)
> @@ -157,7 +156,8 @@ i915_gem_evict_something(struct i915_address_space *vm,
>   				    min_size, alignment, cache_level,
>   				    start, end, mode);
>   
> -	/* Retire before we search the active list. Although we have
> +	/*
> +	 * Retire before we search the active list. Although we have
>   	 * reasonable accuracy in our retirement lists, we may have
>   	 * a stray pin (preventing eviction) that can only be resolved by
>   	 * retiring.
> @@ -182,7 +182,8 @@ i915_gem_evict_something(struct i915_address_space *vm,
>   		BUG_ON(ret);
>   	}
>   
> -	/* Can we unpin some objects such as idle hw contents,
> +	/*
> +	 * Can we unpin some objects such as idle hw contents,
>   	 * or pending flips? But since only the GGTT has global entries
>   	 * such as scanouts, rinbuffers and contexts, we can skip the
>   	 * purge when inspecting per-process local address spaces.
> @@ -190,19 +191,33 @@ i915_gem_evict_something(struct i915_address_space *vm,
>   	if (!i915_is_ggtt(vm) || flags & PIN_NONBLOCK)
>   		return -ENOSPC;
>   
> -	if (ggtt_is_idle(dev_priv)) {
> -		/* If we still have pending pageflip completions, drop
> -		 * back to userspace to give our workqueues time to
> -		 * acquire our locks and unpin the old scanouts.
> -		 */
> -		return intel_has_pending_fb_unpin(dev_priv) ? -EAGAIN : -ENOSPC;
> -	}
> +	/*
> +	 * Not everything in the GGTT is tracked via VMA using
> +	 * i915_vma_move_to_active(), otherwise we could evict as required
> +	 * with minimal stalling. Instead we are forced to idle the GPU and
> +	 * explicitly retire outstanding requests which will then remove
> +	 * the pinning for active objects such as contexts and ring,
> +	 * enabling us to evict them on the next iteration.
> +	 *
> +	 * To ensure that all user contexts are evictable, we perform
> +	 * a switch to the perma-pinned kernel context. This all also gives
> +	 * us a termination condition, when the last retired context is
> +	 * the kernel's there is no more we can evict.
> +	 */
> +	if (!ggtt_is_idle(dev_priv)) {
> +		ret = ggtt_flush(dev_priv);
> +		if (ret)
> +			return ret;
>   
> -	ret = ggtt_flush(dev_priv);
> -	if (ret)
> -		return ret;
> +		goto search_again;
> +	}
>   
> -	goto search_again;
> +	/*
> +	 * If we still have pending pageflip completions, drop
> +	 * back to userspace to give our workqueues time to
> +	 * acquire our locks and unpin the old scanouts.
> +	 */
> +	return intel_has_pending_fb_unpin(dev_priv) ? -EAGAIN : -ENOSPC;
>   
>   found:
>   	/* drm_mm doesn't allow any other other operations while
> 

Looks like it will fix the bug and can't spot that it introduces a 
problem. Was there an igt which was failing or any bugzilla?

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Regards,

Tvrtko


More information about the Intel-gfx mailing list