[Intel-gfx] [PATCH] drm/i915: Inspect subunit states on hangcheck

Chris Wilson chris at chris-wilson.co.uk
Thu Dec 10 05:54:40 PST 2015

On Tue, Dec 01, 2015 at 05:56:12PM +0200, Mika Kuoppala wrote:
> If head seems stuck and engine in question is rcs,
> inspect subunit state transitions from undone to done,
> before deciding that this really is a hang instead of limited
> progress. Only account the transitions of subunits from
> undone to done once, to prevent unstable subunit states
> to keep us falsely active.
> As this adds one extra steps to hangcheck heuristics,
> before hang is declared, it adds 1500ms to to detect hang
> for render ring to a total of 7500ms. We could sample
> the subunit states on first head stuck condition but
> decide not to do so only in order to mimic old behaviour. This
> way the check order of promotion from seqno > atchd > instdone
> is consistently done.
> v2: Deal with unstable done states (Arun)
>     Clear instdone progress on head and seqno movement (Chris)
>     Report raw and accumulated instdone's in in debugfs (Chris)
>     Return HANGCHECK_ACTIVE on undone->done
> References: https://bugs.freedesktop.org/show_bug.cgi?id=93029
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Dave Gordon <david.s.gordon at intel.com>
> Cc: Daniel Vetter <daniel at ffwll.ch>
> Cc: Arun Siluvery <arun.siluvery at linux.intel.com>
> Signed-off-by: Mika Kuoppala <mika.kuoppala at intel.com>

I feel slightly dubious in discarding the 1->0 transitions (as it just
means that a shared function that was previously idle is now in use
again), but if they truly do fluctuate randomly? then accumulating
should mean we eventually escape.

Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>

Chris Wilson, Intel Open Source Technology Centre

More information about the Intel-gfx mailing list