[Intel-gfx] [PATCH v3 3/3] drm/i915: Fix premature LRC unpin in GuC mode

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Fri Jan 22 08:33:18 PST 2016


On 21/01/16 11:02, Chris Wilson wrote:
> On Thu, Jan 21, 2016 at 10:10:42AM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>
>> In GuC mode LRC pinning lifetime depends exclusively on the
>> request liftime. Since that is terminated by the seqno update
>> that opens up a race condition between GPU finishing writing
>> out the context image and the driver unpinning the LRC.
>>
>> To extend the LRC lifetime we will employ a similar approach
>> to what legacy ringbuffer submission does.
>>
>> We will start tracking the last submitted context per engine
>> and keep it pinned until it is replaced by another one.
>>
>> Note that the driver unload path is a bit fragile and could
>> benefit greatly from efforts to unify the legacy and exec
>> list submission code paths.
>>
>> At the moment i915_gem_context_fini has special casing for the
>> two which are potentialy not needed, and also depends on
>> i915_gem_cleanup_ringbuffer running before itself.
>>
>> v2:
>>   * Move pinning into engine->emit_request and actually fix
>>     the reference/unreference logic. (Chris Wilson)
>>
>>   * ring->dev can be NULL on driver unload so use a different
>>     route towards it.
>>
>> v3:
>>   * Rebase.
>>   * Handle the reset path. (Chris Wilson)
>>   * Exclude default context from the pinning - it is impossible
>>     to get it right before default context special casing in
>>     general is eliminated.
>
> Since you have the refactoring in place, eliminating the special case is
> trivial. So patch 2.5?
>
> Looks like http://patchwork.freedesktop.org/patch/69972/ but you already
> have half of that patch in place.
>
> Also if we take a step to add patch 2.1:
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index c25083c78ba7..d43c886fd95a 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -321,6 +321,14 @@ err_destroy:
>          return ERR_PTR(ret);
>   }
>
> +static void i915_gem_context_unpin(struct intel_context *ctx,
> +                                  struct intel_engine_cs *rcs)
> +{
> +       if (rcs->id == RCS && ctx->legacy_hw_ctx.rcs_state)
> +               i915_gem_object_ggtt_unpin(ctx->legacy_hw_ctx.rcs_state);
> +       i915_gem_context_unreference(ctx);
> +}
> +
>   void i915_gem_context_reset(struct drm_device *dev)
>   {
>          struct drm_i915_private *dev_priv = dev->dev_private;
> @@ -338,13 +346,9 @@ void i915_gem_context_reset(struct drm_device *dev)
>
>          for (i = 0; i < I915_NUM_RINGS; i++) {
>                  struct intel_engine_cs *ring = &dev_priv->ring[i];
> -               struct intel_context *lctx = ring->last_context;
>
> -               if (lctx) {
> -                       if (lctx->legacy_hw_ctx.rcs_state && i == RCS)
> -                               i915_gem_object_ggtt_unpin(lctx->legacy_hw_ctx.rcs_state);
> -
> -                       i915_gem_context_unreference(lctx);
> +               if (ring->last_context) {
> +                       i915_gem_context_unpin(ring->last_context, ring);
>                          ring->last_context = NULL;
>                  }
>
> @@ -424,25 +428,18 @@ void i915_gem_context_fini(struct drm_device *dev)
>                   * to offset the do_switch part, so that i915_gem_context_unreference()
>                   * can then free the base object correctly. */
>                  WARN_ON(!dev_priv->ring[RCS].last_context);
> -               if (dev_priv->ring[RCS].last_context == dctx) {
> -                       /* Fake switch to NULL context */
> -                       WARN_ON(dctx->legacy_hw_ctx.rcs_state->active);
> -                       i915_gem_object_ggtt_unpin(dctx->legacy_hw_ctx.rcs_state);
> -                       i915_gem_context_unreference(dctx);
> -                       dev_priv->ring[RCS].last_context = NULL;
> -               }
> -
>                  i915_gem_object_ggtt_unpin(dctx->legacy_hw_ctx.rcs_state);
>          }
>
>          for (i = 0; i < I915_NUM_RINGS; i++) {
>                  struct intel_engine_cs *ring = &dev_priv->ring[i];
>
> -               if (ring->last_context)
> -                       i915_gem_context_unreference(ring->last_context);
> +               if (ring->last_context) {
> +                       i915_gem_context_unpin(ring->last_context, ring);
> +                       ring->last_context = NULL;
> +               }
>
>                  ring->default_context = NULL;
> -               ring->last_context = NULL;
>          }
>
>          i915_gem_context_unreference(dctx);
>
> Then we just end up with
>
> if (i915.enable_execlists)
> 	intel_lr_context_unpin(ring->last_context, ring);
> else
> 	i915_gem_context_unpin(ring->last_context, ring);
>
> which is much less messy in this patch and one step closer to unifying
> engine->pin_context()/engine->unpin_context() vfuns.
>
> The only tricky part there is having to disentangle the legacy paths,
> but doing so should allow us to see clearly that the sequence is just
>
> intel_gpu_reset();
> for_each_engine() unpin(engine->last_context);
> unpin(dev_priv->default_context);
>
> After a few more passes, we should be able to call the reset and cleanup
> functions higher up (as part of the stopping the GPU upon suspend
> procedure before tearing down the sw structs) at which point we only
> need the unpin(default_context) here.

I was confused by the current code which does the reset after unpinning:

...
	i915_gem_reset(dev);

	simulated = dev_priv->gpu_error.stop_rings != 0;

	ret = intel_gpu_reset(dev);
...


What is right then?

Sounds bad to be unpinning with the GPU in unknown state. But obviously 
it has been like this for who knows how long. So I have no idea. :(

Asking since GuC is at the moment struggling with reset, but then again 
that might be something completely else...

Regards,

Tvrtko


More information about the Intel-gfx mailing list