[Intel-gfx] [PATCH] drm/i915/guc: Fix missing ecodes

Teres Alexis, Alan Previn alan.previn.teres.alexis at intel.com
Thu Jan 26 19:17:53 UTC 2023


Firstly, thanks for catching this miss.
Since I only have one trivial nit and one non-blocker ask.
and the non-blocker ask will not impact the patch intent as it merely
tweaks an existing debug message, I believe we have an rb:

Reviewed-by: Alan Previn <alan.previn.teres.alexis at intel.com>

On Tue, 2023-01-24 at 16:49 -0800, Harrison, John C wrote:
> From: John Harrison <John.C.Harrison at Intel.com>
> 
> Error captures are tagged with an 'ecode'. This is a pseduo-unique magic
> number that is meant to distinguish similar seeming bugs with
> different underlying signatures. It is a combination of two ring state
> registers. Unfortunately, the register state being used is only valid
> in execlist mode. In GuC mode, the register state exists in a separate
> list of arbitrary register address/value pairs rather than the named
> entry structure. So, search through that list to find the two exciting
> registers and copy them over to the structure's named members.
> 
> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
> Fixes: a6f0f9cf330a ("drm/i915/guc: Plumb GuC-capture into gpu_coredump")
> Cc: Alan Previn <alan.previn.teres.alexis at intel.com>
> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>
> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
> Cc: Jani Nikula <jani.nikula at linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
> Cc: Matt Roper <matthew.d.roper at intel.com>
> Cc: Aravind Iddamsetty <aravind.iddamsetty at intel.com>
> Cc: Michael Cheng <michael.cheng at intel.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> Cc: Bruce Chang <yu.bruce.chang at intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> Cc: Matthew Auld <matthew.auld at intel.com>
> ---
>  .../gpu/drm/i915/gt/uc/intel_guc_capture.c    | 22 +++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> index 1c1b85073b4bd..4e0b06ceed96d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> @@ -1571,6 +1571,27 @@ int intel_guc_capture_print_engine_node(struct drm_i915_error_state_buf *ebuf,
>  
>  #endif //CONFIG_DRM_I915_CAPTURE_ERROR
>  
> +static void guc_capture_find_ecode(struct intel_engine_coredump *ee)
> +{
> +       struct gcap_reg_list_info *reginfo;
> +       struct guc_mmio_reg *regs;
> +       i915_reg_t reg_ipehr = RING_IPEHR(0);
> +       i915_reg_t reg_instdone = RING_INSTDONE(0);
> +       int i;
> +
> +       if (!ee->guc_capture_node)
> +               return;
> +
> +       reginfo = ee->guc_capture_node->reginfo + GUC_CAPTURE_LIST_TYPE_ENGINE_INSTANCE;
> +       regs = reginfo->regs;
> +       for (i = 0; i < reginfo->num_regs; i++) {
> +               if (regs[i].offset == reg_ipehr.reg)
> +                       ee->ipehr = regs[i].value;
> +               if (regs[i].offset == reg_instdone.reg)
nit: "else if"?
> +                       ee->instdone.instdone = regs[i].value;
> +       }
> +}
> +
>  void intel_guc_capture_free_node(struct intel_engine_coredump *ee)
>  {
>         if (!ee || !ee->guc_capture_node)
> @@ -1612,6 +1633,7 @@ void intel_guc_capture_get_matching_node(struct intel_gt *gt,
>                         list_del(&n->link);
>                         ee->guc_capture_node = n;
>                         ee->capture = guc->capture;
> +                       guc_capture_find_ecode(ee);
>                         return;
>                 }
>         }

alan: only one non-blocker request:
while we are here, could we update the debug message when we can't find a matching captured node?
Current code:
	drm_dbg(&i915->drm, "GuC capture can't match ee to node\n");
New suggestion:
	drm_dbg(&i915->drm, "GuC capture can't find node for ee-ctx: lcra = 0x%08x | gucid = 0x%08x\n",
		ce->lrc.lrca, ce->guc_id.id);





More information about the Intel-gfx mailing list