[Intel-gfx] [PATCH] drm/i915: Trigger hangcheck if we detect more a repeating missed IRQ

Wed Apr 11 10:18:15 CEST 2012

On Tue, 10 Apr 2012 16:59:11 -0700, Ben Widawsky <ben at bwidawsk.net> wrote:
> On Tue, 10 Apr 2012 17:00:41 +0100
> Chris Wilson <chris at chris-wilson.co.uk> wrote:
> 
> > On the first instance we just wish to kick the waiters and see if that
> > terminates the wait conditions. If it does not, then we do not want to
> > keep retrying without ever making any forward progress and becoming
> > stuck in a hangcheck loop.
> > 
> > Reported-and-tested-by: Lukas Hejtmanek <xhejtman at fi.muni.cz>
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48209
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> 
> I'm still confused about the problem we are purportedly fixing.
> 
> This should happen if we've missed an irq (or the watchdog fired too
> soon), and then fires again before the thread has actually woken up to
> realize that is missed the first IRQ?
> 
> As for extract the kick_ring bit of code for core hangcheck_elapsed,
> that looks fine. I just don't quite understand the exact problem this
> solves, and can't envision how we hit this case it seems the patch will
> fix.

Sure, just look at the bug report for the garbage we wrote into the
ringbuffers and how we ended up indefinite wait. This is not defense
against normal behaviour but the driver screwing up.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre