[Intel-gfx] [RFC 2/2] drm/i915: Use ABI engine class in error state ecode

Chris Wilson chris at chris-wilson.co.uk
Thu Nov 5 09:49:37 UTC 2020


Quoting Tvrtko Ursulin (2020-11-05 09:31:07)
> 
> On 04/11/2020 23:30, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-11-04 13:47:43)
> >> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>
> >> Instead of printing out the internal engine mask, which can change between
> >> kernel versions making it difficult to map to actual engines, present a
> >> bitmask of hanging engines ABI classes. For example:
> >>
> >>    [drm] GPU HANG: ecode 9:24dffffd:8, in gem_exec_schedu [1334]
> >>
> >> Notice the swapped the order of ecode and bitmask which makes the new
> >> versus old bug reports are obvious.
> >>
> >> Engine ABI class is useful to quickly categorize render vs media etc hangs
> >> in bug reports. Considering virtual engine even more so than the current
> >> scheme.
> >>
> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/i915_gpu_error.c | 13 +++++++------
> >>   1 file changed, 7 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> >> index 857db66cc4a3..e7d9af184d58 100644
> >> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> >> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> >> @@ -1659,17 +1659,16 @@ static u32 generate_ecode(const struct intel_engine_coredump *ee)
> >>   static const char *error_msg(struct i915_gpu_coredump *error)
> >>   {
> >>          struct intel_engine_coredump *first = NULL;
> >> +       unsigned int hung_classes = 0;
> >>          struct intel_gt_coredump *gt;
> >> -       intel_engine_mask_t engines;
> >>          int len;
> >>   
> >> -       engines = 0;
> >>          for (gt = error->gt; gt; gt = gt->next) {
> >>                  struct intel_engine_coredump *cs;
> >>   
> >>                  for (cs = gt->engine; cs; cs = cs->next) {
> >>                          if (cs->hung) {
> >> -                               engines |= cs->engine->mask;
> >> +                               hung_classes |= BIT(cs->engine->uabi_class);
> > 
> > Your argument makes sense.
> > 
> >>                                  if (!first)
> >>                                          first = cs;
> >>                          }
> >> @@ -1677,9 +1676,11 @@ static const char *error_msg(struct i915_gpu_coredump *error)
> >>          }
> >>   
> >>          len = scnprintf(error->error_msg, sizeof(error->error_msg),
> >> -                       "GPU HANG: ecode %d:%x:%08x",
> >> -                       INTEL_GEN(error->i915), engines,
> >> -                       generate_ecode(first));
> >> +                       "GPU HANG: ecode %d:%08x:%x",
> >> +                       INTEL_GEN(error->i915),
> >> +                       generate_ecode(first),
> >> +                       hung_classes);
> > 
> > I vote for keeping gen:engines:ecode order, for me that is biggest to
> > smallest.
> 
> It would not be obvious what kind of bits are in the mask then, say 
> looking from the ecode in maybe bugzilla titles and two different bugs 
> may be incorrectly marked as duplicate.

No one should be marking bugs as duplicate based on this string. It
really is that useless.

> Maybe instead of the order we 
> could change the separator(s)? Or add prefix/suffix to the mask?

I don't see the point; we've changed the construction several times.
-Chris


More information about the Intel-gfx mailing list