[Intel-gfx] [PATCH] drm/i915: Check hangcheck is functioning before indefinite waits

Thu Jul 3 18:00:45 CEST 2014

On Thu, 3 Jul 2014 16:51:11 +0100
Chris Wilson <chris at chris-wilson.co.uk> wrote:

> On Thu, Jul 03, 2014 at 08:44:20AM -0700, Jesse Barnes wrote:
> > On Thu,  3 Jul 2014 08:09:01 +0100
> > Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > 
> > > Since we rely on hangcheck to wait up and kick us out of an indefinite
> > > wait should the GPU ever stop functioning, it appears sensible that we
> > > should check that hangcheck is indeed active before starting that wait.
> > > This just prevents a driver error in the processing of hangcheck from
> > > appearing to hang the machine.
> 
> > Are there any bugs associated with this?
> 
> No open bugs. They have cropped up during dev though, and I think I am
> not alone. I believe that both Ben and I have tried to convince Daniel
> the merits of having this security blanket.
>  
> > i915_rearm_hangcheck() or something might more accurately describe
> > what's going on here.
> 
> How about i915_ensure_hangcheck()? (I agree that rearm is better than
> check.)
>  
> > I suppose both of these paths are protected by the struct_mutex?  If
> > not, might we race and mod_timer() this twice from two threads in
> > succession?  I guess that's harmless...
> 
> Concurrently arming a timer within a jiffie or two isn't going to make
> too much difference, or even pushing an almost firing timer off by
> another hangcheck interval. Conversely, since we already have read the
> hangcheck counter, if the hangcheck does fire before we schedule(), that
> will immediately wake us up and we will spot the hang.

Sounds good.  ensure_hangcheck() or update_hangcheck() are fine with me
too.

Reviewed-by: Jesse Barnes <jbarnes at virtuousgeek.org>

-- 
Jesse Barnes, Intel Open Source Technology Center