[Intel-gfx] [PATCH] drm/i915: Check hangcheck is functioning before indefinite waits
Jesse Barnes
jbarnes at virtuousgeek.org
Thu Jul 3 18:00:45 CEST 2014
On Thu, 3 Jul 2014 16:51:11 +0100
Chris Wilson <chris at chris-wilson.co.uk> wrote:
> On Thu, Jul 03, 2014 at 08:44:20AM -0700, Jesse Barnes wrote:
> > On Thu, 3 Jul 2014 08:09:01 +0100
> > Chris Wilson <chris at chris-wilson.co.uk> wrote:
> >
> > > Since we rely on hangcheck to wait up and kick us out of an indefinite
> > > wait should the GPU ever stop functioning, it appears sensible that we
> > > should check that hangcheck is indeed active before starting that wait.
> > > This just prevents a driver error in the processing of hangcheck from
> > > appearing to hang the machine.
>
> > Are there any bugs associated with this?
>
> No open bugs. They have cropped up during dev though, and I think I am
> not alone. I believe that both Ben and I have tried to convince Daniel
> the merits of having this security blanket.
>
> > i915_rearm_hangcheck() or something might more accurately describe
> > what's going on here.
>
> How about i915_ensure_hangcheck()? (I agree that rearm is better than
> check.)
>
> > I suppose both of these paths are protected by the struct_mutex? If
> > not, might we race and mod_timer() this twice from two threads in
> > succession? I guess that's harmless...
>
> Concurrently arming a timer within a jiffie or two isn't going to make
> too much difference, or even pushing an almost firing timer off by
> another hangcheck interval. Conversely, since we already have read the
> hangcheck counter, if the hangcheck does fire before we schedule(), that
> will immediately wake us up and we will spot the hang.
Sounds good. ensure_hangcheck() or update_hangcheck() are fine with me
too.
Reviewed-by: Jesse Barnes <jbarnes at virtuousgeek.org>
--
Jesse Barnes, Intel Open Source Technology Center
More information about the Intel-gfx
mailing list