[Intel-gfx] [PATCH] drm/i915: Trigger hangcheck if we detect more a repeating missed IRQ

Sun Apr 15 00:12:52 CEST 2012

On Wed, Apr 11, 2012 at 01:32:55PM -0700, Ben Widawsky wrote:
> On Wed, 11 Apr 2012 09:18:15 +0100
> Chris Wilson <chris at chris-wilson.co.uk> wrote:
> 
> > On Tue, 10 Apr 2012 16:59:11 -0700, Ben Widawsky <ben at bwidawsk.net> wrote:
> > > On Tue, 10 Apr 2012 17:00:41 +0100
> > > Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > > 
> > > > On the first instance we just wish to kick the waiters and see if that
> > > > terminates the wait conditions. If it does not, then we do not want to
> > > > keep retrying without ever making any forward progress and becoming
> > > > stuck in a hangcheck loop.
> > > > 
> > > > Reported-and-tested-by: Lukas Hejtmanek <xhejtman at fi.muni.cz>
> > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48209
> > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > 
> > > I'm still confused about the problem we are purportedly fixing.
> > > 
> > > This should happen if we've missed an irq (or the watchdog fired too
> > > soon), and then fires again before the thread has actually woken up to
> > > realize that is missed the first IRQ?
> > > 
> > > As for extract the kick_ring bit of code for core hangcheck_elapsed,
> > > that looks fine. I just don't quite understand the exact problem this
> > > solves, and can't envision how we hit this case it seems the patch will
> > > fix.
> > 
> > Sure, just look at the bug report for the garbage we wrote into the
> > ringbuffers and how we ended up indefinite wait. This is not defense
> > against normal behaviour but the driver screwing up.
> > -Chris
> > 
> 
> In that case this is
> Reviewed-by: Ben Widawsky <ben at bwidawsk.net>
> 
> Though I am still pretty surprised that we have even seen this :|

Queued for -next, thanks for the patch.
-Daniel
-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48