[Intel-gfx] [PATCH] drm/i915: Check hangcheck is functioning before indefinite waits

Chris Wilson chris at chris-wilson.co.uk
Thu Jul 3 17:51:11 CEST 2014


On Thu, Jul 03, 2014 at 08:44:20AM -0700, Jesse Barnes wrote:
> On Thu,  3 Jul 2014 08:09:01 +0100
> Chris Wilson <chris at chris-wilson.co.uk> wrote:
> 
> > Since we rely on hangcheck to wait up and kick us out of an indefinite
> > wait should the GPU ever stop functioning, it appears sensible that we
> > should check that hangcheck is indeed active before starting that wait.
> > This just prevents a driver error in the processing of hangcheck from
> > appearing to hang the machine.

> Are there any bugs associated with this?

No open bugs. They have cropped up during dev though, and I think I am
not alone. I believe that both Ben and I have tried to convince Daniel
the merits of having this security blanket.
 
> i915_rearm_hangcheck() or something might more accurately describe
> what's going on here.

How about i915_ensure_hangcheck()? (I agree that rearm is better than
check.)
 
> I suppose both of these paths are protected by the struct_mutex?  If
> not, might we race and mod_timer() this twice from two threads in
> succession?  I guess that's harmless...

Concurrently arming a timer within a jiffie or two isn't going to make
too much difference, or even pushing an almost firing timer off by
another hangcheck interval. Conversely, since we already have read the
hangcheck counter, if the hangcheck does fire before we schedule(), that
will immediately wake us up and we will spot the hang.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre



More information about the Intel-gfx mailing list