[Intel-gfx] [PATCH v6 13/20] drm/i915/guc: Provide register list to be saved/restored during engine reset

Wed Apr 19 00:26:05 UTC 2017

On 18/04/17 13:23, Michel Thierry wrote:
> From: Arun Siluvery <arun.siluvery at linux.intel.com>
>
> GuC expects a list of registers from the driver which are saved/restored
> during engine reset. The type of value to be saved is controlled by
> flags. We provide a minimal set of registers that we want GuC to save and
> restore. This is not an issue in case of engine reset as driver initializes
> most of them following an engine reset, but in case of media reset (aka
> watchdog reset) which is completely internal to GuC (including resubmission
> of hung workload), it is necessary to provide this list, otherwise GuC won't
> be able to schedule further workloads after a reset. This is the minimal
> set of registers identified for things to work as expected but if we see
> any new issues, this register list can be expanded.
>
> v2: REGSET_MASKED is too difficult for GuC, use REGSET_SAVE_DEFAULT_VALUE
> and current value from RING_MODE reg instead; no need to preserve
> head/tail either, be extra paranoid and save whitelisted registers (Daniele).
>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> Signed-off-by: Arun Siluvery <arun.siluvery at linux.intel.com>
> Signed-off-by: Jeff McGee <jeff.mcgee at intel.com>
> Signed-off-by: Michel Thierry <michel.thierry at intel.com>
> ---
>  drivers/gpu/drm/i915/i915_guc_submission.c | 60 +++++++++++++++++++++++++++++-
>  1 file changed, 58 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> index 1ea36a88d2fb..d772718861df 100644
> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> @@ -1003,6 +1003,24 @@ static void guc_policies_init(struct guc_policies *policies)
>  	policies->is_valid = 1;
>  }
>
> +/*
> + * In this macro it is highly unlikely to exceed max value but even if we did
> + * it is not an error so just throw a warning and continue. Only side effect
> + * in continuing further means some registers won't be added to save/restore
> + * list.
> + */
> +#define GUC_ADD_MMIO_REG_ADS(node, reg_addr, _flags, defvalue)		\
> +	do {								\
> +		u32 __count = node->number_of_registers;		\
> +		if (WARN_ON(__count >= GUC_REGSET_MAX_REGISTERS))	\
> +			continue;					\
> +		node->registers[__count].offset = reg_addr.reg;		\
> +		node->registers[__count].flags = (_flags);		\
> +		if (defvalue)						\
> +			node->registers[__count].value = (defvalue);	\
> +		node->number_of_registers++;				\
> +	} while (0)
> +
>  static int guc_ads_create(struct intel_guc *guc)
>  {
>  	struct drm_i915_private *dev_priv = guc_to_i915(guc);
> @@ -1016,6 +1034,7 @@ static int guc_ads_create(struct intel_guc *guc)
>  		u8 reg_state_buffer[GUC_S3_SAVE_SPACE_PAGES * PAGE_SIZE];
>  	} __packed *blob;
>  	struct intel_engine_cs *engine;
> +	struct i915_workarounds *workarounds = &dev_priv->workarounds;
>  	enum intel_engine_id id;
>  	u32 base;
>
> @@ -1035,6 +1054,39 @@ static int guc_ads_create(struct intel_guc *guc)
>
>  	/* MMIO reg state */
>  	for_each_engine(engine, dev_priv, id) {
> +		u32 i;
> +		struct guc_mmio_regset *eng_reg =
> +			&blob->reg_state.engine_reg[engine->guc_id];
> +
> +		/*
> +		 * Provide a list of registers to be saved/restored during gpu
> +		 * reset. This is mainly required for Media reset (aka watchdog
> +		 * timeout) which is completely under the control of GuC
> +		 * (resubmission of hung workload is handled inside GuC).
> +		 */
> +		GUC_ADD_MMIO_REG_ADS(eng_reg, RING_HWS_PGA(engine->mmio_base),
> +				     GUC_REGSET_ENGINERESET |
> +				     GUC_REGSET_SAVE_CURRENT_VALUE, 0);
> +
> +		/*
> +		 * Workaround the guc issue with masked registers, note that
> +		 * at this point guc submission is still disabled and the mode
> +		 * register doesnt have the irq_steering bit set, which we
> +		 * need to fwd irqs to GuC.
> +		 */
> +		GUC_ADD_MMIO_REG_ADS(eng_reg, RING_MODE_GEN7(engine),
> +				     GUC_REGSET_ENGINERESET |
> +				     GUC_REGSET_SAVE_DEFAULT_VALUE,
> +				     I915_READ(RING_MODE_GEN7(engine)) |
> +				     GFX_INTERRUPT_STEERING | (0xFFFF<<16));
> +
> +		GUC_ADD_MMIO_REG_ADS(eng_reg, RING_IMR(engine->mmio_base),
> +				     GUC_REGSET_ENGINERESET |
> +				     GUC_REGSET_SAVE_CURRENT_VALUE, 0);
> +

I might just be too paranoid, but I think that we also have to add the 
registers that we use for WAs via mmio (i.e. not using an LRI in the 
ringbuffer). I did a quick test for the registers in the 
gen9_init_workarounds and skl_init_workarounds functions that we write 
using I915_WRITE and it looks like some of them lose their value after 
and RCS media reset:

  REG	   WA BITS	VAL PRE-MR	VAL POST-MR
0x20D4	(1<<2)		0x00000004	0x00000000
0xb11c	(1<<2)		0x00000005	0x00000001
0x4090	(1<<8) (1<<25)	0x02000100	0x02000100
0xb118	(1<<21)		0x40600000	0x40400000
0x20e0	(1<<14)		0x00004000	0x00000000
0xB004	(1<<7)		0x29124180	0x29124100

For some reason the media reset count in debugfs is still showing zero 
though, so I might be doing something wrong. Can you double check what 
happens to those registers?

Thanks,
Daniele

> +		DRM_DEBUG_DRIVER("%s register save/restore count: %u\n",
> +				 engine->name, eng_reg->number_of_registers);
> +
>  		blob->reg_state.white_list[engine->guc_id].mmio_start =
>  			i915_mmio_reg_offset(RING_FORCE_TO_NONPRIV(engine->mmio_base, 0));
>
> @@ -1044,9 +1096,13 @@ static int guc_ads_create(struct intel_guc *guc)
>  		 * inconsistencies with the handling of FORCE_TO_NONPRIV
>  		 * registers.
>  		 */
> -		blob->reg_state.white_list[engine->guc_id].count = 0;
> +		blob->reg_state.white_list[engine->guc_id].count =
> +					workarounds->hw_whitelist_count[id];
>
> -		/* Nothing to be saved or restored for now. */
> +		for (i = 0; i < workarounds->hw_whitelist_count[id]; i++) {
> +			blob->reg_state.white_list[engine->guc_id].offsets[i] =
> +				I915_READ(RING_FORCE_TO_NONPRIV(engine->mmio_base, i));
> +		}
>  	}
>
>  	/*
>