[Intel-gfx] [PATCH v2 3/3] drm/i915: Check for a stuck waiter before a missed interrupt

Fri Jul 22 07:57:24 UTC 2016

On Thu, Jul 21, 2016 at 07:57:39AM +0100, Chris Wilson wrote:
> As the interrupt wakeup counter only increments when we have a waiter,
> before testing to see if that counter is unchanged we have to first
> check that we do expect it to change (i.e. we have a waiter).
> 
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 7104dc1463eb..45afcdfe89b1 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -3062,7 +3062,9 @@ static unsigned long kick_waiters(struct intel_engine_cs *engine)
>  	struct drm_i915_private *i915 = engine->i915;
>  	unsigned long irq_count = READ_ONCE(engine->breadcrumbs.irq_wakeups);
>  
> -	if (engine->hangcheck.user_interrupts == irq_count &&
> +	rcu_read_lock();
> +	if (intel_engine_wakeup(engine) &&
> +	    engine->hangcheck.user_interrupts == irq_count &&

Sigh. Completely nerfs the detection of stuck waiters.
Should be
	if (engine->hangcheck.user_interrupts == irq_count &&
	    intel_engine_wakeup(engine) &&

The test itself doesn't imply a missed interrupt either, there be a
valid long lived batch causing a delay in the waiter. We can do better
if we allow ourselves to take a spinlock here.
-Chris


-- 
Chris Wilson, Intel Open Source Technology Centre