[Intel-gfx] [PATCH v2 1/3] drm/i915: Fix eviction when the GGTT is idle but full
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Wed Oct 11 12:05:02 UTC 2017
On 10/10/2017 22:38, Chris Wilson wrote:
> In the full-ppgtt world, we can fill the GGTT full of context objects.
> These context objects are currently implicitly tracked by the requests
> that pin them i.e. they are only unpinned when the request is completed
> and retired, but we do not have the link from the vma to the request
> (anymore). In order to unpin those contexts, we have to issue another
> request and wait upon the switch to the kernel context.
>
> The bug during eviction was that we assumed that a full GGTT meant we
> would have requests on the GGTT timeline, and so we missed situations
> where those requests where merely in flight (and when even they have not
> yet been submitted to hw yet). The fix employed here is to change the
> already-is-idle test to no look at the execution timeline, but count the
> outstanding requests and then check that we have switched to the kernel
> context. Erring on the side of overkill here just means that we stall a
> little longer than may be strictly required, but we only expect to hit
> this path in extreme corner cases where returning an erroneous error is
> worse than the delay.
>
> v2: Logical inversion when swapping over branches.
>
> Fixes: 80b204bce8f2 ("drm/i915: Enable multiple timelines")
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem_evict.c | 63 ++++++++++++++++++++++-------------
> 1 file changed, 39 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
> index a5a5b7e6daae..ee4811ffb7aa 100644
> --- a/drivers/gpu/drm/i915/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/i915_gem_evict.c
> @@ -33,21 +33,20 @@
> #include "intel_drv.h"
> #include "i915_trace.h"
>
> -static bool ggtt_is_idle(struct drm_i915_private *dev_priv)
> +static bool ggtt_is_idle(struct drm_i915_private *i915)
> {
> - struct i915_ggtt *ggtt = &dev_priv->ggtt;
> - struct intel_engine_cs *engine;
> - enum intel_engine_id id;
> + struct intel_engine_cs *engine;
> + enum intel_engine_id id;
>
> - for_each_engine(engine, dev_priv, id) {
> - struct intel_timeline *tl;
> + if (i915->gt.active_requests)
> + return false;
>
> - tl = &ggtt->base.timeline.engine[engine->id];
> - if (i915_gem_active_isset(&tl->last_request))
> - return false;
> - }
> + for_each_engine(engine, i915, id) {
> + if (engine->last_retired_context != i915->kernel_context)
> + return false;
> + }
>
> - return true;
> + return true;
> }
>
> static int ggtt_flush(struct drm_i915_private *i915)
> @@ -157,7 +156,8 @@ i915_gem_evict_something(struct i915_address_space *vm,
> min_size, alignment, cache_level,
> start, end, mode);
>
> - /* Retire before we search the active list. Although we have
> + /*
> + * Retire before we search the active list. Although we have
> * reasonable accuracy in our retirement lists, we may have
> * a stray pin (preventing eviction) that can only be resolved by
> * retiring.
> @@ -182,7 +182,8 @@ i915_gem_evict_something(struct i915_address_space *vm,
> BUG_ON(ret);
> }
>
> - /* Can we unpin some objects such as idle hw contents,
> + /*
> + * Can we unpin some objects such as idle hw contents,
> * or pending flips? But since only the GGTT has global entries
> * such as scanouts, rinbuffers and contexts, we can skip the
> * purge when inspecting per-process local address spaces.
> @@ -190,19 +191,33 @@ i915_gem_evict_something(struct i915_address_space *vm,
> if (!i915_is_ggtt(vm) || flags & PIN_NONBLOCK)
> return -ENOSPC;
>
> - if (ggtt_is_idle(dev_priv)) {
> - /* If we still have pending pageflip completions, drop
> - * back to userspace to give our workqueues time to
> - * acquire our locks and unpin the old scanouts.
> - */
> - return intel_has_pending_fb_unpin(dev_priv) ? -EAGAIN : -ENOSPC;
> - }
> + /*
> + * Not everything in the GGTT is tracked via VMA using
> + * i915_vma_move_to_active(), otherwise we could evict as required
> + * with minimal stalling. Instead we are forced to idle the GPU and
> + * explicitly retire outstanding requests which will then remove
> + * the pinning for active objects such as contexts and ring,
> + * enabling us to evict them on the next iteration.
> + *
> + * To ensure that all user contexts are evictable, we perform
> + * a switch to the perma-pinned kernel context. This all also gives
> + * us a termination condition, when the last retired context is
> + * the kernel's there is no more we can evict.
> + */
> + if (!ggtt_is_idle(dev_priv)) {
> + ret = ggtt_flush(dev_priv);
> + if (ret)
> + return ret;
>
> - ret = ggtt_flush(dev_priv);
> - if (ret)
> - return ret;
> + goto search_again;
> + }
>
> - goto search_again;
> + /*
> + * If we still have pending pageflip completions, drop
> + * back to userspace to give our workqueues time to
> + * acquire our locks and unpin the old scanouts.
> + */
> + return intel_has_pending_fb_unpin(dev_priv) ? -EAGAIN : -ENOSPC;
>
> found:
> /* drm_mm doesn't allow any other other operations while
>
Looks like it will fix the bug and can't spot that it introduces a
problem. Was there an igt which was failing or any bugzilla?
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
Regards,
Tvrtko
More information about the Intel-gfx
mailing list