[Intel-gfx] [PATCH 2/6] drm/i915: Decouple hang detection from hangcheck period

Chris Wilson chris at chris-wilson.co.uk
Wed Nov 16 08:07:36 UTC 2016


On Tue, Nov 15, 2016 at 04:36:32PM +0200, Mika Kuoppala wrote:
> +static bool
> +hangcheck_engine_stall(struct intel_engine_cs *engine,
> +		       struct intel_engine_hangcheck *hc)
> +{
> +	const unsigned long last_action = engine->hangcheck.action_timestamp;
>  
> -		memset(&engine->hangcheck.instdone, 0,
> -		       sizeof(engine->hangcheck.instdone));
> -		break;
> +	if (hc->action == HANGCHECK_ACTIVE_SEQNO ||
> +	    hc->action == HANGCHECK_IDLE)
> +		return false;
> +
> +	if (time_before(jiffies, last_action + HANGCHECK_HUNG_JIFFIES))
> +		return false;
> +
> +	if (time_before(jiffies, last_action + HANGCHECK_STUCK_JIFFIES))
> +		if (hc->action != HANGCHECK_HUNG)
> +			return false;

Coming back to this, this is very confusing as it is written. As you
write it, the primary decision is based on time "if time since last
action is greater than threshold, declare stuck or hang". That doesn't
convey that you only want to take certain action if the head hasn't
moved for N seconds, or if the seqno hasn't changed for M seconds.

I kind of expect something more like:

unsigned long timeout;

switch (hc->action) {
case ACTIVE:
case IDLE:
	return false;

case NO_HEAD_MOVEMENT:
	timeout = 5s;
	break;

case NO_SEQNO_ADVANCE:
	timeout = 20s
	break;
}

return time_after(jiffies, last_action + timeout);
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list