[Intel-gfx] [RFC 2/2] drm/i915: Use ABI engine class in error state ecode

Chris Wilson chris at chris-wilson.co.uk
Wed Nov 4 23:30:18 UTC 2020


Quoting Tvrtko Ursulin (2020-11-04 13:47:43)
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> 
> Instead of printing out the internal engine mask, which can change between
> kernel versions making it difficult to map to actual engines, present a
> bitmask of hanging engines ABI classes. For example:
> 
>   [drm] GPU HANG: ecode 9:24dffffd:8, in gem_exec_schedu [1334]
> 
> Notice the swapped the order of ecode and bitmask which makes the new
> versus old bug reports are obvious.
> 
> Engine ABI class is useful to quickly categorize render vs media etc hangs
> in bug reports. Considering virtual engine even more so than the current
> scheme.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gpu_error.c | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 857db66cc4a3..e7d9af184d58 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1659,17 +1659,16 @@ static u32 generate_ecode(const struct intel_engine_coredump *ee)
>  static const char *error_msg(struct i915_gpu_coredump *error)
>  {
>         struct intel_engine_coredump *first = NULL;
> +       unsigned int hung_classes = 0;
>         struct intel_gt_coredump *gt;
> -       intel_engine_mask_t engines;
>         int len;
>  
> -       engines = 0;
>         for (gt = error->gt; gt; gt = gt->next) {
>                 struct intel_engine_coredump *cs;
>  
>                 for (cs = gt->engine; cs; cs = cs->next) {
>                         if (cs->hung) {
> -                               engines |= cs->engine->mask;
> +                               hung_classes |= BIT(cs->engine->uabi_class);

Your argument makes sense.

>                                 if (!first)
>                                         first = cs;
>                         }
> @@ -1677,9 +1676,11 @@ static const char *error_msg(struct i915_gpu_coredump *error)
>         }
>  
>         len = scnprintf(error->error_msg, sizeof(error->error_msg),
> -                       "GPU HANG: ecode %d:%x:%08x",
> -                       INTEL_GEN(error->i915), engines,
> -                       generate_ecode(first));
> +                       "GPU HANG: ecode %d:%08x:%x",
> +                       INTEL_GEN(error->i915),
> +                       generate_ecode(first),
> +                       hung_classes);

I vote for keeping gen:engines:ecode order, for me that is biggest to
smallest.
-Chris


More information about the Intel-gfx mailing list