[Intel-gfx] [RFC 2/2] drm/i915: Use user engine names in error state ecode
Chris Wilson
chris at chris-wilson.co.uk
Wed Nov 4 13:21:49 UTC 2020
Quoting Tvrtko Ursulin (2020-11-04 13:06:43)
>
> On 04/11/2020 12:33, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-11-04 12:20:43)
> >> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>
> >> Instead of printing out the internal engine mask, which can change between
> >> kernel versions making it difficult to map to actual engines, list user
> >> friendly engine names in the ecode string. For example:
> >
> > Nah. It's a nonsense number, just exists for quick and futile discrimination.
> > Trying to interpret it is pointless.
> >
> > There's very little value to be gained from it, it should just serve as a
> > tale-tell that we have captured an error state. The action and impact of
> > the reset should be separately recorded.
>
> My problem with the nonsense number is that we have it, but that is is
> unstable and people are interpreting it.
>
> How about a bitmask of uabi classes instead? As you can see I really
> want something from the ABI-land, or not at all. Classes might be just
> the thing for the purpose of a signature.
You can probably tell I've been pushing for the not-at-all :)
I've personally not found it helpful, it's too simplistic and unstable
even for repeated GL hangs. The concept of having a hash that can
summarise the hang is definitely a good idea, but the input to that hash
is flawed.
Given that we record the reset action, and the context that was
impacted, I wonder how much we need to say here. Just announce a new
error state has been captured?
-Chris
More information about the Intel-gfx
mailing list