[Intel-gfx] [RFC] drm/i915/hangcheck: Look at instdone for all engines

Chris Wilson chris at chris-wilson.co.uk
Mon Jun 10 13:28:29 UTC 2019


Quoting Tvrtko Ursulin (2019-06-10 14:16:32)
> 
> On 10/06/2019 14:05, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-06-10 14:01:10)
> >> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>
> >> It seems intel_engine_get_instdone is able to get instdone for all engines
> >> but intel_hangcheck.c/subunits_stuck decides to ignore it for non render.
> >>
> >> We can just drop the check in subunits_stuck since the checks on
> >> unavailable fields will always return stuck, which when bitwise and with
> >> the potential unstuck instdone is harmless.
> >>
> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> >> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> >> ---
> >> Would actually using the available data improve hang detection?
> > 
> > No, just prolong it :)
> 
> I was thinking activity on instdone would, but correctly so, no? If 
> functional blocks have shown a change of status, then we try again until 
> they too get stuck. Some improvement in false positives, but marginal I 
> know given the hangcheck period.

It's a lot more fuzzy when you look at how we OR together the bits for
the history. Not that you are wrong, just being an old curmudgeon. I'm
not really sold on whether it makes any sense at the granularity of our
inspection, currently sampling every ~1.5s.
-Chris


More information about the Intel-gfx mailing list