[Intel-gfx] [RFC 10/11] drm/i915: Debugfs interface for per-engine hang recovery.

Mon Jun 8 10:45:55 PDT 2015

On Mon, Jun 08, 2015 at 06:03:28PM +0100, Tomas Elf wrote:
> 1. The i915_wedged_set function allows us to schedule three forms of hang recovery:
> 
> 	a) Legacy hang recovery: By passing e.g. -1 we trigger the legacy full
> 	GPU reset recovery path.
> 
> 	b) Single engine hang recovery: By passing an engine ID in the interval
> 	of [0, I915_NUM_RINGS) we can schedule hang recovery of any single
> 	engine assuming that the context submission consistency requirements
> 	are met (otherwise the hang recovery path will simply exit early and
> 	wait for another hang detection). The values are assumed to use up bits
> 	3:0 only since we certainly do not support as many as 16 engines.
> 
> 	This mode is supported since there are several legacy test applications
> 	that rely on this interface.

Are there? I don't see them in igt - and let's not start making debugfs
ABI.

> 	c) Multiple engine hang recovery: By passing in an engine flag mask in
> 	bits 31:8 (bit 8 corresponds to engine 0 = RCS, bit 9 corresponds to
> 	engine 1 = VCS etc) we can schedule any combination of engine hang
> 	recoveries as we please. For example, by passing in the value 0x3 << 8
> 	we would schedule hang recovery for engines 0 and 1 (RCS and VCS) at
> 	the same time.

Seems fine. But I don't see the reason for the extra complication.

> 	If bits in fields 3:0 and 31:8 are both used then single engine hang
> 	recovery mode takes presidence and bits 31:8 are ignored.
> 
> 2. The i915_wedged_get function produces a set of statistics related to:

Add it to hangcheck_info instead.

i915_wedged_get could be updated to give the ring mask of wedged rings?
If that concept exists.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre