[Intel-gfx] [PATCH] drm/i915: add hang detection and reset

Jesse Barnes jbarnes at virtuousgeek.org
Wed Jun 24 01:13:38 CEST 2009


On Tue, 23 Jun 2009 16:06:04 -0700
Eric Anholt <eric at anholt.net> wrote:

> On Fri, 2009-06-19 at 13:08 -0700, Jesse Barnes wrote:
> > This patch adds a hangcheck timer, which set set to a 5s timeout at
> > each batch execution, and cleared when sequence number pass or when
> > a leave VT event happens.  It also adds infrastructure for saving &
> > restoring just the display state, in the event we need to reset the
> > display engine too (though that's currently unused).
> > 
> > If the timer fires, the hangcheck function captures some error state
> > and attempts a reset (on 965+ only thus far), and generates a
> > uevent. It's enough to recover from the gem_hang test case,
> > allowing me to start a fresh desktop after the hang has been
> > detected and recovered from, which wasn't possible before.
> > 
> > Wider testing would be much appreciated, with other types of hangs.
> > And of course review is always welcome.  I'm hoping this is safe to
> > include in 2.6.31 since it shouldn't make things any worse than they
> > are today.
> 
> So, 5s will be too short.  The Mesa trispd demo I've seen take up to
> 15s for a single batchbuffer.  I'm thinking we'll need to track the
> intra-batchbuffer head pointer when the timeout fires and reset if
> it's moved.

Hm, and I was thinking 5s would be too long, but ok, I'll see if I can
change the way we cancel the timer.

-- 
Jesse Barnes, Intel Open Source Technology Center



More information about the Intel-gfx mailing list