[Intel-gfx] [PATCH v6 13/20] drm/i915/guc: Provide register list to be saved/restored during engine reset
Michel Thierry
michel.thierry at intel.com
Wed Apr 19 00:44:16 UTC 2017
On 18/04/17 17:26, Daniele Ceraolo Spurio wrote:
>
>
> On 18/04/17 13:23, Michel Thierry wrote:
>> From: Arun Siluvery <arun.siluvery at linux.intel.com>
>>
>> GuC expects a list of registers from the driver which are saved/restored
>> during engine reset. The type of value to be saved is controlled by
>> flags. We provide a minimal set of registers that we want GuC to save and
>> restore. This is not an issue in case of engine reset as driver
>> initializes
>> most of them following an engine reset, but in case of media reset (aka
>> watchdog reset) which is completely internal to GuC (including
>> resubmission
>> of hung workload), it is necessary to provide this list, otherwise GuC
>> won't
>> be able to schedule further workloads after a reset. This is the minimal
>> set of registers identified for things to work as expected but if we see
>> any new issues, this register list can be expanded.
>>
>> v2: REGSET_MASKED is too difficult for GuC, use REGSET_SAVE_DEFAULT_VALUE
>> and current value from RING_MODE reg instead; no need to preserve
>> head/tail either, be extra paranoid and save whitelisted registers
>> (Daniele).
>>
>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>> Signed-off-by: Arun Siluvery <arun.siluvery at linux.intel.com>
>> Signed-off-by: Jeff McGee <jeff.mcgee at intel.com>
>> Signed-off-by: Michel Thierry <michel.thierry at intel.com>
>> ---
>> drivers/gpu/drm/i915/i915_guc_submission.c | 60
>> +++++++++++++++++++++++++++++-
>> 1 file changed, 58 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c
>> b/drivers/gpu/drm/i915/i915_guc_submission.c
>> index 1ea36a88d2fb..d772718861df 100644
>> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
>> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
>> @@ -1003,6 +1003,24 @@ static void guc_policies_init(struct
>> guc_policies *policies)
>> policies->is_valid = 1;
>> }
>>
>> +/*
>> + * In this macro it is highly unlikely to exceed max value but even
>> if we did
>> + * it is not an error so just throw a warning and continue. Only side
>> effect
>> + * in continuing further means some registers won't be added to
>> save/restore
>> + * list.
>> + */
>> +#define GUC_ADD_MMIO_REG_ADS(node, reg_addr, _flags, defvalue) \
>> + do { \
>> + u32 __count = node->number_of_registers; \
>> + if (WARN_ON(__count >= GUC_REGSET_MAX_REGISTERS)) \
>> + continue; \
>> + node->registers[__count].offset = reg_addr.reg; \
>> + node->registers[__count].flags = (_flags); \
>> + if (defvalue) \
>> + node->registers[__count].value = (defvalue); \
>> + node->number_of_registers++; \
>> + } while (0)
>> +
>> static int guc_ads_create(struct intel_guc *guc)
>> {
>> struct drm_i915_private *dev_priv = guc_to_i915(guc);
>> @@ -1016,6 +1034,7 @@ static int guc_ads_create(struct intel_guc *guc)
>> u8 reg_state_buffer[GUC_S3_SAVE_SPACE_PAGES * PAGE_SIZE];
>> } __packed *blob;
>> struct intel_engine_cs *engine;
>> + struct i915_workarounds *workarounds = &dev_priv->workarounds;
>> enum intel_engine_id id;
>> u32 base;
>>
>> @@ -1035,6 +1054,39 @@ static int guc_ads_create(struct intel_guc *guc)
>>
>> /* MMIO reg state */
>> for_each_engine(engine, dev_priv, id) {
>> + u32 i;
>> + struct guc_mmio_regset *eng_reg =
>> + &blob->reg_state.engine_reg[engine->guc_id];
>> +
>> + /*
>> + * Provide a list of registers to be saved/restored during gpu
>> + * reset. This is mainly required for Media reset (aka watchdog
>> + * timeout) which is completely under the control of GuC
>> + * (resubmission of hung workload is handled inside GuC).
>> + */
>> + GUC_ADD_MMIO_REG_ADS(eng_reg, RING_HWS_PGA(engine->mmio_base),
>> + GUC_REGSET_ENGINERESET |
>> + GUC_REGSET_SAVE_CURRENT_VALUE, 0);
>> +
>> + /*
>> + * Workaround the guc issue with masked registers, note that
>> + * at this point guc submission is still disabled and the mode
>> + * register doesnt have the irq_steering bit set, which we
>> + * need to fwd irqs to GuC.
>> + */
>> + GUC_ADD_MMIO_REG_ADS(eng_reg, RING_MODE_GEN7(engine),
>> + GUC_REGSET_ENGINERESET |
>> + GUC_REGSET_SAVE_DEFAULT_VALUE,
>> + I915_READ(RING_MODE_GEN7(engine)) |
>> + GFX_INTERRUPT_STEERING | (0xFFFF<<16));
>> +
>> + GUC_ADD_MMIO_REG_ADS(eng_reg, RING_IMR(engine->mmio_base),
>> + GUC_REGSET_ENGINERESET |
>> + GUC_REGSET_SAVE_CURRENT_VALUE, 0);
>> +
>
> I might just be too paranoid, but I think that we also have to add the
> registers that we use for WAs via mmio (i.e. not using an LRI in the
> ringbuffer). I did a quick test for the registers in the
> gen9_init_workarounds and skl_init_workarounds functions that we write
> using I915_WRITE and it looks like some of them lose their value after
> and RCS media reset:
>
> REG WA BITS VAL PRE-MR VAL POST-MR
> 0x20D4 (1<<2) 0x00000004 0x00000000
> 0xb11c (1<<2) 0x00000005 0x00000001
> 0x4090 (1<<8) (1<<25) 0x02000100 0x02000100
> 0xb118 (1<<21) 0x40600000 0x40400000
> 0x20e0 (1<<14) 0x00004000 0x00000000
> 0xB004 (1<<7) 0x29124180 0x29124100
>
Indeed. Kind of makes sense since i915 won't be aware of the reset at
all (compared to the non-guc version). Btw it only seems to happen if
the watchdog is triggered in the render engine.
> For some reason the media reset count in debugfs is still showing zero
> though, so I might be doing something wrong. Can you double check what
> happens to those registers?
>
You need fw version 9.x+
> Thanks,
> Daniele
>
>> + DRM_DEBUG_DRIVER("%s register save/restore count: %u\n",
>> + engine->name, eng_reg->number_of_registers);
>> +
>> blob->reg_state.white_list[engine->guc_id].mmio_start =
>>
>> i915_mmio_reg_offset(RING_FORCE_TO_NONPRIV(engine->mmio_base, 0));
>>
>> @@ -1044,9 +1096,13 @@ static int guc_ads_create(struct intel_guc *guc)
>> * inconsistencies with the handling of FORCE_TO_NONPRIV
>> * registers.
>> */
>> - blob->reg_state.white_list[engine->guc_id].count = 0;
>> + blob->reg_state.white_list[engine->guc_id].count =
>> + workarounds->hw_whitelist_count[id];
>>
>> - /* Nothing to be saved or restored for now. */
>> + for (i = 0; i < workarounds->hw_whitelist_count[id]; i++) {
>> + blob->reg_state.white_list[engine->guc_id].offsets[i] =
>> + I915_READ(RING_FORCE_TO_NONPRIV(engine->mmio_base, i));
>> + }
>> }
>>
>> /*
>>
More information about the Intel-gfx
mailing list