[Intel-gfx] [PATCH 4/6] drm/i915: Record the default hw state after reset upon load

Thu Nov 2 14:23:00 UTC 2017

On Thu, Nov 02, 2017 at 12:42:24PM +0000, Chris Wilson wrote:
> Take a copy of the HW state after a reset upon module loading by
> executing a context switch from a blank context to the kernel context,
> thus saving the default hw state over the blank context image.
> We can then use the default hw state to initialise any future context,
> ensuring that each starts with the default view of hw state.
> 
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/gvt/scheduler.c    |  2 -
>  drivers/gpu/drm/i915/i915_debugfs.c     |  1 -
>  drivers/gpu/drm/i915/i915_gem.c         | 69 +++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_gem_context.c | 50 +++---------------------
>  drivers/gpu/drm/i915/i915_gem_context.h |  4 +-
>  drivers/gpu/drm/i915/intel_engine_cs.c  |  3 ++
>  drivers/gpu/drm/i915/intel_lrc.c        | 35 ++++++++++-------
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 47 +++++++++++++++-------
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
>  9 files changed, 137 insertions(+), 75 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
> index f6ded475bb2c..42cc61230ca7 100644
> --- a/drivers/gpu/drm/i915/gvt/scheduler.c
> +++ b/drivers/gpu/drm/i915/gvt/scheduler.c
> @@ -723,8 +723,6 @@ int intel_vgpu_init_gvt_context(struct intel_vgpu *vgpu)
>  	if (IS_ERR(vgpu->shadow_ctx))
>  		return PTR_ERR(vgpu->shadow_ctx);
>  
> -	vgpu->shadow_ctx->engine[RCS].initialised = true;
> -
>  	bitmap_zero(vgpu->shadow_ctx_desc_updated, I915_NUM_ENGINES);
>  
>  	return 0;
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 39883cd915db..cfcef1899da8 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1974,7 +1974,6 @@ static int i915_context_status(struct seq_file *m, void *unused)
>  			struct intel_context *ce = &ctx->engine[engine->id];
>  
>  			seq_printf(m, "%s: ", engine->name);
> -			seq_putc(m, ce->initialised ? 'I' : 'i');
>  			if (ce->state)
>  				describe_obj(m, ce->state->obj);
>  			if (ce->ring)
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index e36a3a840552..ed4b1708fec5 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4967,6 +4967,71 @@ bool intel_sanitize_semaphores(struct drm_i915_private *dev_priv, int value)
>  	return true;
>  }
>  
> +static int __intel_engines_record_defaults(struct drm_i915_private *i915)
> +{
> +	struct i915_gem_context *ctx;
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +	int err;
> +
> +	/*
> +	 * As we reset the gpu during very early sanitisation, the current
> +	 * register state on the GPU should reflect its defaults values.
> +	 * We load a context onto the hw (with restore-inhibit), then switch
> +	 * over to a second context to save that default register state. We
> +	 * can then prime every new context with that state so they all start
> +	 * from defaults.
> +	 */
> +
> +	ctx = i915_gem_context_create_kernel(i915, 0);
> +	if (IS_ERR(ctx))
> +		return PTR_ERR(ctx);
> +
> +	for_each_engine(engine, i915, id) {
> +		struct drm_i915_gem_request *rq;
> +
> +		rq = i915_gem_request_alloc(engine, ctx);
> +		if (IS_ERR(rq)) {
> +			err = PTR_ERR(rq);
> +			goto out_ctx;
> +		}
> +
> +		err = i915_switch_context(rq);
> +		if (engine->init_context)
> +			err = engine->init_context(rq);
> +
> +		__i915_add_request(rq, true);
> +		if (err)
> +			goto out_ctx;
> +	}
> +
> +	err = i915_gem_switch_to_kernel_context(i915);
> +	if (err)
> +		goto out_ctx;
> +
> +	err = i915_gem_wait_for_idle(i915, I915_WAIT_LOCKED);
> +	if (err)
> +		goto out_ctx;
> +
> +	for_each_engine(engine, i915, id) {
> +		if (!ctx->engine[id].state)
> +			continue;
> +
> +		engine->default_state =
> +			i915_gem_object_get(ctx->engine[id].state->obj);
> +
> +		err = i915_gem_object_set_to_cpu_domain(engine->default_state,
> +							false);
> +		if (err)
> +			goto out_ctx;

Should we unmap the default context from the GTT here to make sure
the GPU never gets a chance to clobber it?

I also had the notion that context might save a copy of its CCID, and
that we'd potentially have to adjust it in every new context image. But
now that I look at the docs it seems pre-HSW CCID was part of the
runlist part of the context which we don't use, and on HSW it seems to
be saved in the power context. And I presume it must also be part of
the power context on pre-HSW in ringbuffer mode, otherwise it would get
lost when entering rc6.

And indeed not having CCID in the context makes perfect sense since
MI_SET_CONTEXT provides the new value. Thus also getting it from
the context itself would be redundant, and potentially dangerous if
the context has moved to another location within the GTT since the
last time it was used.

-- 
Ville Syrjälä
Intel OTC