[Intel-gfx] [PATCH v6 13/20] drm/i915/guc: Provide register list to be saved/restored during engine reset

Michel Thierry michel.thierry at intel.com
Wed Apr 19 00:44:16 UTC 2017


On 18/04/17 17:26, Daniele Ceraolo Spurio wrote:
>
>
> On 18/04/17 13:23, Michel Thierry wrote:
>> From: Arun Siluvery <arun.siluvery at linux.intel.com>
>>
>> GuC expects a list of registers from the driver which are saved/restored
>> during engine reset. The type of value to be saved is controlled by
>> flags. We provide a minimal set of registers that we want GuC to save and
>> restore. This is not an issue in case of engine reset as driver
>> initializes
>> most of them following an engine reset, but in case of media reset (aka
>> watchdog reset) which is completely internal to GuC (including
>> resubmission
>> of hung workload), it is necessary to provide this list, otherwise GuC
>> won't
>> be able to schedule further workloads after a reset. This is the minimal
>> set of registers identified for things to work as expected but if we see
>> any new issues, this register list can be expanded.
>>
>> v2: REGSET_MASKED is too difficult for GuC, use REGSET_SAVE_DEFAULT_VALUE
>> and current value from RING_MODE reg instead; no need to preserve
>> head/tail either, be extra paranoid and save whitelisted registers
>> (Daniele).
>>
>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>> Signed-off-by: Arun Siluvery <arun.siluvery at linux.intel.com>
>> Signed-off-by: Jeff McGee <jeff.mcgee at intel.com>
>> Signed-off-by: Michel Thierry <michel.thierry at intel.com>
>> ---
>>  drivers/gpu/drm/i915/i915_guc_submission.c | 60
>> +++++++++++++++++++++++++++++-
>>  1 file changed, 58 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c
>> b/drivers/gpu/drm/i915/i915_guc_submission.c
>> index 1ea36a88d2fb..d772718861df 100644
>> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
>> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
>> @@ -1003,6 +1003,24 @@ static void guc_policies_init(struct
>> guc_policies *policies)
>>      policies->is_valid = 1;
>>  }
>>
>> +/*
>> + * In this macro it is highly unlikely to exceed max value but even
>> if we did
>> + * it is not an error so just throw a warning and continue. Only side
>> effect
>> + * in continuing further means some registers won't be added to
>> save/restore
>> + * list.
>> + */
>> +#define GUC_ADD_MMIO_REG_ADS(node, reg_addr, _flags, defvalue)        \
>> +    do {                                \
>> +        u32 __count = node->number_of_registers;        \
>> +        if (WARN_ON(__count >= GUC_REGSET_MAX_REGISTERS))    \
>> +            continue;                    \
>> +        node->registers[__count].offset = reg_addr.reg;        \
>> +        node->registers[__count].flags = (_flags);        \
>> +        if (defvalue)                        \
>> +            node->registers[__count].value = (defvalue);    \
>> +        node->number_of_registers++;                \
>> +    } while (0)
>> +
>>  static int guc_ads_create(struct intel_guc *guc)
>>  {
>>      struct drm_i915_private *dev_priv = guc_to_i915(guc);
>> @@ -1016,6 +1034,7 @@ static int guc_ads_create(struct intel_guc *guc)
>>          u8 reg_state_buffer[GUC_S3_SAVE_SPACE_PAGES * PAGE_SIZE];
>>      } __packed *blob;
>>      struct intel_engine_cs *engine;
>> +    struct i915_workarounds *workarounds = &dev_priv->workarounds;
>>      enum intel_engine_id id;
>>      u32 base;
>>
>> @@ -1035,6 +1054,39 @@ static int guc_ads_create(struct intel_guc *guc)
>>
>>      /* MMIO reg state */
>>      for_each_engine(engine, dev_priv, id) {
>> +        u32 i;
>> +        struct guc_mmio_regset *eng_reg =
>> +            &blob->reg_state.engine_reg[engine->guc_id];
>> +
>> +        /*
>> +         * Provide a list of registers to be saved/restored during gpu
>> +         * reset. This is mainly required for Media reset (aka watchdog
>> +         * timeout) which is completely under the control of GuC
>> +         * (resubmission of hung workload is handled inside GuC).
>> +         */
>> +        GUC_ADD_MMIO_REG_ADS(eng_reg, RING_HWS_PGA(engine->mmio_base),
>> +                     GUC_REGSET_ENGINERESET |
>> +                     GUC_REGSET_SAVE_CURRENT_VALUE, 0);
>> +
>> +        /*
>> +         * Workaround the guc issue with masked registers, note that
>> +         * at this point guc submission is still disabled and the mode
>> +         * register doesnt have the irq_steering bit set, which we
>> +         * need to fwd irqs to GuC.
>> +         */
>> +        GUC_ADD_MMIO_REG_ADS(eng_reg, RING_MODE_GEN7(engine),
>> +                     GUC_REGSET_ENGINERESET |
>> +                     GUC_REGSET_SAVE_DEFAULT_VALUE,
>> +                     I915_READ(RING_MODE_GEN7(engine)) |
>> +                     GFX_INTERRUPT_STEERING | (0xFFFF<<16));
>> +
>> +        GUC_ADD_MMIO_REG_ADS(eng_reg, RING_IMR(engine->mmio_base),
>> +                     GUC_REGSET_ENGINERESET |
>> +                     GUC_REGSET_SAVE_CURRENT_VALUE, 0);
>> +
>
> I might just be too paranoid, but I think that we also have to add the
> registers that we use for WAs via mmio (i.e. not using an LRI in the
> ringbuffer). I did a quick test for the registers in the
> gen9_init_workarounds and skl_init_workarounds functions that we write
> using I915_WRITE and it looks like some of them lose their value after
> and RCS media reset:
>
>  REG       WA BITS    VAL PRE-MR    VAL POST-MR
> 0x20D4    (1<<2)        0x00000004    0x00000000
> 0xb11c    (1<<2)        0x00000005    0x00000001
> 0x4090    (1<<8) (1<<25)    0x02000100    0x02000100
> 0xb118    (1<<21)        0x40600000    0x40400000
> 0x20e0    (1<<14)        0x00004000    0x00000000
> 0xB004    (1<<7)        0x29124180    0x29124100
>

Indeed. Kind of makes sense since i915 won't be aware of the reset at 
all (compared to the non-guc version). Btw it only seems to happen if 
the watchdog is triggered in the render engine.

> For some reason the media reset count in debugfs is still showing zero
> though, so I might be doing something wrong. Can you double check what
> happens to those registers?
>

You need fw version 9.x+

> Thanks,
> Daniele
>
>> +        DRM_DEBUG_DRIVER("%s register save/restore count: %u\n",
>> +                 engine->name, eng_reg->number_of_registers);
>> +
>>          blob->reg_state.white_list[engine->guc_id].mmio_start =
>>
>> i915_mmio_reg_offset(RING_FORCE_TO_NONPRIV(engine->mmio_base, 0));
>>
>> @@ -1044,9 +1096,13 @@ static int guc_ads_create(struct intel_guc *guc)
>>           * inconsistencies with the handling of FORCE_TO_NONPRIV
>>           * registers.
>>           */
>> -        blob->reg_state.white_list[engine->guc_id].count = 0;
>> +        blob->reg_state.white_list[engine->guc_id].count =
>> +                    workarounds->hw_whitelist_count[id];
>>
>> -        /* Nothing to be saved or restored for now. */
>> +        for (i = 0; i < workarounds->hw_whitelist_count[id]; i++) {
>> +            blob->reg_state.white_list[engine->guc_id].offsets[i] =
>> +                I915_READ(RING_FORCE_TO_NONPRIV(engine->mmio_base, i));
>> +        }
>>      }
>>
>>      /*
>>


More information about the Intel-gfx mailing list