[Intel-gfx] [PATCH] drm/i915: Don't continually defer the hangcheck

Mika Kuoppala mika.kuoppala at linux.intel.com
Fri Nov 7 16:14:36 CET 2014


Chris Wilson <chris at chris-wilson.co.uk> writes:

> On Fri, Nov 07, 2014 at 04:28:33PM +0200, Mika Kuoppala wrote:
>> Chris Wilson <chris at chris-wilson.co.uk> writes:
>> 
>> > With multiple rings, we may continue to render on the blitter whilst
>> > executing an infinite shader on the render ring. As we currently, rearm
>> > the timer with each execbuf, in this scenario the hangcheck will never
>> > fire and we will never detect the lockup on the render ring. Instead,
>> > only arm the timer once per hangcheck, so that hangcheck runs more
>> > frequently.
>> >
>> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>> > Cc: Mika Kuoppala <mika.kuoppala at intel.com>
>> > ---
>> >  drivers/gpu/drm/i915/i915_irq.c | 9 +++++++--
>> >  1 file changed, 7 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
>> > index 318a6a0724d0..82b4d742aba5 100644
>> > --- a/drivers/gpu/drm/i915/i915_irq.c
>> > +++ b/drivers/gpu/drm/i915/i915_irq.c
>> > @@ -3039,11 +3039,16 @@ static void i915_hangcheck_elapsed(unsigned long data)
>> >  void i915_queue_hangcheck(struct drm_device *dev)
>> >  {
>> >  	struct drm_i915_private *dev_priv = dev->dev_private;
>> > +	struct timer_list *timer = &dev_priv->gpu_error.hangcheck_timer;
>> > +
>> >  	if (!i915.enable_hangcheck)
>> >  		return;
>> >  
>> > -	mod_timer(&dev_priv->gpu_error.hangcheck_timer,
>> > -		  round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES));
>> > +	if (timer_pending(timer))
>> > +		return;
>> > +
>> 
>> As this is called from both process and interrupt context, what
>> keeps us safe from not messing up the timer bookkeepping? The lock in timer code?
>> 
>> I am thinking that the other thread will hit the BUG_ON in add_timer().
>
> if (!timer_pending(timer))
> 	timer->expires = round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES));
> mod_timer(timer, timer->expires);
> ?

With this changed:

Reviewed-by: Mika Kuoppala <mika.kuoppala at intel.com>

> Or we just switch to using delayed_work() with a slightly less risky
> interface.

We should, after the dust settles.

> -Chris
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre



More information about the Intel-gfx mailing list