[Intel-gfx] [PATCH] drm/i915: Restore inhibiting the load of the default context

Chris Wilson chris at chris-wilson.co.uk
Fri Nov 27 05:14:43 PST 2015


On Fri, Nov 27, 2015 at 01:32:11PM +0200, Mika Kuoppala wrote:
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> 
> > Following a GPU reset, we may leave the context in a poorly defined
> > state, and reloading from that context will leave the GPU flummoxed. For
> > secondary contexts, this will lead to that context being banned - but
> > currently it is also causing the default context to become banned,
> > leading to turmoil in the shared state.
> >
> > This is a regression from
> >
> > commit 6702cf16e0ba8b0129f5aa1b6609d4e9c70bc13b [v4.1]
> > Author: Ben Widawsky <benjamin.widawsky at intel.com>
> > Date:   Mon Mar 16 16:00:58 2015 +0000
> >
> >     drm/i915: Initialize all contexts
> >
> > which quietly introduced the removal of the MI_RESTORE_INHIBIT on the
> > default context.
> >
> 
> As we never submit anything except driver initialization commands
> for that context, what would cause this context to become corrupted?

I can only hazard that the act of reseting the GPU left it invalid. A
bisect pointed to that commit, and partially reverting each chunk left
me with the conclusion that the hang was a direct result of reloading
the context. Closer inspection may reveal someelse suspect about the
context, but I object to this sneaky change.

> Please consider:
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> b/drivers/gpu/drm/i915/i915_gem_context.c
> index 43761c5..45b9a39 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -332,6 +332,7 @@ void i915_gem_context_reset(struct drm_device *dev)
>         for (i = 0; i < I915_NUM_RINGS; i++) {
>                 struct intel_engine_cs *ring = &dev_priv->ring[i];
>                 struct intel_context *lctx = ring->last_context;
> +               struct intel_context *dctx = ring->default_context;
>  
>                 if (lctx) {
>                         if (lctx->legacy_hw_ctx.rcs_state && i == RCS)
> @@ -340,6 +341,9 @@ void i915_gem_context_reset(struct drm_device *dev)
>                         i915_gem_context_unreference(lctx);
>                         ring->last_context = NULL;
>                 }
> +
> +               if (dctx)
> +                       dctx->legacy_hw_ctx.initialized = false;
>         }
>  }
> 
> To achieve the same effect and as a bonus, get the
> same default context (with workarounds) as we
> did in driver init.

I considered it, and wondered why it wasn't already there. It is a
separate issue imo.
 
> I also think that we should zero the global
> default context in here to gain similarity wrt
> module init.

You mean reallocate it from scratch? We have avoided doing the
reallocations in the past, as they can fail at inopportune times
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list