[Intel-gfx] [RFC 2/2] drm/i915: Use ABI engine class in error state ecode
Chris Wilson
chris at chris-wilson.co.uk
Wed Nov 4 23:30:18 UTC 2020
Quoting Tvrtko Ursulin (2020-11-04 13:47:43)
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>
> Instead of printing out the internal engine mask, which can change between
> kernel versions making it difficult to map to actual engines, present a
> bitmask of hanging engines ABI classes. For example:
>
> [drm] GPU HANG: ecode 9:24dffffd:8, in gem_exec_schedu [1334]
>
> Notice the swapped the order of ecode and bitmask which makes the new
> versus old bug reports are obvious.
>
> Engine ABI class is useful to quickly categorize render vs media etc hangs
> in bug reports. Considering virtual engine even more so than the current
> scheme.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> ---
> drivers/gpu/drm/i915/i915_gpu_error.c | 13 +++++++------
> 1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 857db66cc4a3..e7d9af184d58 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1659,17 +1659,16 @@ static u32 generate_ecode(const struct intel_engine_coredump *ee)
> static const char *error_msg(struct i915_gpu_coredump *error)
> {
> struct intel_engine_coredump *first = NULL;
> + unsigned int hung_classes = 0;
> struct intel_gt_coredump *gt;
> - intel_engine_mask_t engines;
> int len;
>
> - engines = 0;
> for (gt = error->gt; gt; gt = gt->next) {
> struct intel_engine_coredump *cs;
>
> for (cs = gt->engine; cs; cs = cs->next) {
> if (cs->hung) {
> - engines |= cs->engine->mask;
> + hung_classes |= BIT(cs->engine->uabi_class);
Your argument makes sense.
> if (!first)
> first = cs;
> }
> @@ -1677,9 +1676,11 @@ static const char *error_msg(struct i915_gpu_coredump *error)
> }
>
> len = scnprintf(error->error_msg, sizeof(error->error_msg),
> - "GPU HANG: ecode %d:%x:%08x",
> - INTEL_GEN(error->i915), engines,
> - generate_ecode(first));
> + "GPU HANG: ecode %d:%08x:%x",
> + INTEL_GEN(error->i915),
> + generate_ecode(first),
> + hung_classes);
I vote for keeping gen:engines:ecode order, for me that is biggest to
smallest.
-Chris
More information about the Intel-gfx
mailing list