[PATCH] drm/i915/guc: Fix missing ecodes
Teres Alexis, Alan Previn
alan.previn.teres.alexis at intel.com
Thu Jan 26 19:17:53 UTC 2023
Firstly, thanks for catching this miss.
Since I only have one trivial nit and one non-blocker ask.
and the non-blocker ask will not impact the patch intent as it merely
tweaks an existing debug message, I believe we have an rb:
Reviewed-by: Alan Previn <alan.previn.teres.alexis at intel.com>
On Tue, 2023-01-24 at 16:49 -0800, Harrison, John C wrote:
> From: John Harrison <John.C.Harrison at Intel.com>
>
> Error captures are tagged with an 'ecode'. This is a pseduo-unique magic
> number that is meant to distinguish similar seeming bugs with
> different underlying signatures. It is a combination of two ring state
> registers. Unfortunately, the register state being used is only valid
> in execlist mode. In GuC mode, the register state exists in a separate
> list of arbitrary register address/value pairs rather than the named
> entry structure. So, search through that list to find the two exciting
> registers and copy them over to the structure's named members.
>
> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
> Fixes: a6f0f9cf330a ("drm/i915/guc: Plumb GuC-capture into gpu_coredump")
> Cc: Alan Previn <alan.previn.teres.alexis at intel.com>
> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>
> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
> Cc: Jani Nikula <jani.nikula at linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
> Cc: Matt Roper <matthew.d.roper at intel.com>
> Cc: Aravind Iddamsetty <aravind.iddamsetty at intel.com>
> Cc: Michael Cheng <michael.cheng at intel.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> Cc: Bruce Chang <yu.bruce.chang at intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> Cc: Matthew Auld <matthew.auld at intel.com>
> ---
> .../gpu/drm/i915/gt/uc/intel_guc_capture.c | 22 +++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> index 1c1b85073b4bd..4e0b06ceed96d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> @@ -1571,6 +1571,27 @@ int intel_guc_capture_print_engine_node(struct drm_i915_error_state_buf *ebuf,
>
> #endif //CONFIG_DRM_I915_CAPTURE_ERROR
>
> +static void guc_capture_find_ecode(struct intel_engine_coredump *ee)
> +{
> + struct gcap_reg_list_info *reginfo;
> + struct guc_mmio_reg *regs;
> + i915_reg_t reg_ipehr = RING_IPEHR(0);
> + i915_reg_t reg_instdone = RING_INSTDONE(0);
> + int i;
> +
> + if (!ee->guc_capture_node)
> + return;
> +
> + reginfo = ee->guc_capture_node->reginfo + GUC_CAPTURE_LIST_TYPE_ENGINE_INSTANCE;
> + regs = reginfo->regs;
> + for (i = 0; i < reginfo->num_regs; i++) {
> + if (regs[i].offset == reg_ipehr.reg)
> + ee->ipehr = regs[i].value;
> + if (regs[i].offset == reg_instdone.reg)
nit: "else if"?
> + ee->instdone.instdone = regs[i].value;
> + }
> +}
> +
> void intel_guc_capture_free_node(struct intel_engine_coredump *ee)
> {
> if (!ee || !ee->guc_capture_node)
> @@ -1612,6 +1633,7 @@ void intel_guc_capture_get_matching_node(struct intel_gt *gt,
> list_del(&n->link);
> ee->guc_capture_node = n;
> ee->capture = guc->capture;
> + guc_capture_find_ecode(ee);
> return;
> }
> }
alan: only one non-blocker request:
while we are here, could we update the debug message when we can't find a matching captured node?
Current code:
drm_dbg(&i915->drm, "GuC capture can't match ee to node\n");
New suggestion:
drm_dbg(&i915->drm, "GuC capture can't find node for ee-ctx: lcra = 0x%08x | gucid = 0x%08x\n",
ce->lrc.lrca, ce->guc_id.id);
More information about the dri-devel
mailing list