[Intel-gfx] [PATCH] drm/i915: Trigger hangcheck if we detect more a repeating missed IRQ

Wed Apr 11 22:32:55 CEST 2012

On Wed, 11 Apr 2012 09:18:15 +0100
Chris Wilson <chris at chris-wilson.co.uk> wrote:

> On Tue, 10 Apr 2012 16:59:11 -0700, Ben Widawsky <ben at bwidawsk.net> wrote:
> > On Tue, 10 Apr 2012 17:00:41 +0100
> > Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > 
> > > On the first instance we just wish to kick the waiters and see if that
> > > terminates the wait conditions. If it does not, then we do not want to
> > > keep retrying without ever making any forward progress and becoming
> > > stuck in a hangcheck loop.
> > > 
> > > Reported-and-tested-by: Lukas Hejtmanek <xhejtman at fi.muni.cz>
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48209
> > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > 
> > I'm still confused about the problem we are purportedly fixing.
> > 
> > This should happen if we've missed an irq (or the watchdog fired too
> > soon), and then fires again before the thread has actually woken up to
> > realize that is missed the first IRQ?
> > 
> > As for extract the kick_ring bit of code for core hangcheck_elapsed,
> > that looks fine. I just don't quite understand the exact problem this
> > solves, and can't envision how we hit this case it seems the patch will
> > fix.
> 
> Sure, just look at the bug report for the garbage we wrote into the
> ringbuffers and how we ended up indefinite wait. This is not defense
> against normal behaviour but the driver screwing up.
> -Chris
> 

In that case this is
Reviewed-by: Ben Widawsky <ben at bwidawsk.net>

Though I am still pretty surprised that we have even seen this :|