[Intel-gfx] [PATCH 4/6] drm/i915: Record the default hw state after reset upon load

Thu Nov 2 14:37:40 UTC 2017

Quoting Ville Syrjälä (2017-11-02 14:23:00)
> On Thu, Nov 02, 2017 at 12:42:24PM +0000, Chris Wilson wrote:
> > Take a copy of the HW state after a reset upon module loading by
> > executing a context switch from a blank context to the kernel context,
> > thus saving the default hw state over the blank context image.
> > We can then use the default hw state to initialise any future context,
> > ensuring that each starts with the default view of hw state.
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/gvt/scheduler.c    |  2 -
> >  drivers/gpu/drm/i915/i915_debugfs.c     |  1 -
> >  drivers/gpu/drm/i915/i915_gem.c         | 69 +++++++++++++++++++++++++++++++++
> >  drivers/gpu/drm/i915/i915_gem_context.c | 50 +++---------------------
> >  drivers/gpu/drm/i915/i915_gem_context.h |  4 +-
> >  drivers/gpu/drm/i915/intel_engine_cs.c  |  3 ++
> >  drivers/gpu/drm/i915/intel_lrc.c        | 35 ++++++++++-------
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 47 +++++++++++++++-------
> >  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
> >  9 files changed, 137 insertions(+), 75 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
> > index f6ded475bb2c..42cc61230ca7 100644
> > --- a/drivers/gpu/drm/i915/gvt/scheduler.c
> > +++ b/drivers/gpu/drm/i915/gvt/scheduler.c
> > @@ -723,8 +723,6 @@ int intel_vgpu_init_gvt_context(struct intel_vgpu *vgpu)
> >       if (IS_ERR(vgpu->shadow_ctx))
> >               return PTR_ERR(vgpu->shadow_ctx);
> >  
> > -     vgpu->shadow_ctx->engine[RCS].initialised = true;
> > -
> >       bitmap_zero(vgpu->shadow_ctx_desc_updated, I915_NUM_ENGINES);
> >  
> >       return 0;
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index 39883cd915db..cfcef1899da8 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -1974,7 +1974,6 @@ static int i915_context_status(struct seq_file *m, void *unused)
> >                       struct intel_context *ce = &ctx->engine[engine->id];
> >  
> >                       seq_printf(m, "%s: ", engine->name);
> > -                     seq_putc(m, ce->initialised ? 'I' : 'i');
> >                       if (ce->state)
> >                               describe_obj(m, ce->state->obj);
> >                       if (ce->ring)
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index e36a3a840552..ed4b1708fec5 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -4967,6 +4967,71 @@ bool intel_sanitize_semaphores(struct drm_i915_private *dev_priv, int value)
> >       return true;
> >  }
> >  
> > +static int __intel_engines_record_defaults(struct drm_i915_private *i915)
> > +{
> > +     struct i915_gem_context *ctx;
> > +     struct intel_engine_cs *engine;
> > +     enum intel_engine_id id;
> > +     int err;
> > +
> > +     /*
> > +      * As we reset the gpu during very early sanitisation, the current
> > +      * register state on the GPU should reflect its defaults values.
> > +      * We load a context onto the hw (with restore-inhibit), then switch
> > +      * over to a second context to save that default register state. We
> > +      * can then prime every new context with that state so they all start
> > +      * from defaults.
> > +      */
> > +
> > +     ctx = i915_gem_context_create_kernel(i915, 0);
> > +     if (IS_ERR(ctx))
> > +             return PTR_ERR(ctx);
> > +
> > +     for_each_engine(engine, i915, id) {
> > +             struct drm_i915_gem_request *rq;
> > +
> > +             rq = i915_gem_request_alloc(engine, ctx);
> > +             if (IS_ERR(rq)) {
> > +                     err = PTR_ERR(rq);
> > +                     goto out_ctx;
> > +             }
> > +
> > +             err = i915_switch_context(rq);
> > +             if (engine->init_context)
> > +                     err = engine->init_context(rq);
> > +
> > +             __i915_add_request(rq, true);
> > +             if (err)
> > +                     goto out_ctx;
> > +     }
> > +
> > +     err = i915_gem_switch_to_kernel_context(i915);
> > +     if (err)
> > +             goto out_ctx;
> > +
> > +     err = i915_gem_wait_for_idle(i915, I915_WAIT_LOCKED);
> > +     if (err)
> > +             goto out_ctx;
> > +
> > +     for_each_engine(engine, i915, id) {
> > +             if (!ctx->engine[id].state)
> > +                     continue;
> > +
> > +             engine->default_state =
> > +                     i915_gem_object_get(ctx->engine[id].state->obj);
> > +
> > +             err = i915_gem_object_set_to_cpu_domain(engine->default_state,
> > +                                                     false);
> > +             if (err)
> > +                     goto out_ctx;
> 
> Should we unmap the default context from the GTT here to make sure
> the GPU never gets a chance to clobber it?

Ah. I was thinking it would be unmapped as part of its teardown of the
context at the end of the function. But no, since we keep the object
around, and closing the context doesn't actually call i915_vma_close()
but just reaps the object as soon as possible. Whilst this is the only
place stealing the logical state object from the context, there isn't
much point in placing the fixup logic inside context-close, and we might
as well apply it here.

> I also had the notion that context might save a copy of its CCID, and
> that we'd potentially have to adjust it in every new context image. But
> now that I look at the docs it seems pre-HSW CCID was part of the
> runlist part of the context which we don't use, and on HSW it seems to
> be saved in the power context. And I presume it must also be part of
> the power context on pre-HSW in ringbuffer mode, otherwise it would get
> lost when entering rc6.

The CCID being part of the context would be scary, since it is just its
ggtt address. That would mean we would have to reload each ctx back into
its old location each time...

> And indeed not having CCID in the context makes perfect sense since
> MI_SET_CONTEXT provides the new value. Thus also getting it from
> the context itself would be redundant, and potentially dangerous if
> the context has moved to another location within the GTT since the
> last time it was used.

Nods. What is that say about great minds? Fools never differ. ;)
-Chris