[Intel-gfx] [PATCH] i915: add error detection & state dumping

Jesse Barnes jbarnes at virtuousgeek.org
Tue Apr 21 17:45:00 CEST 2009


On Tue, 21 Apr 2009 09:11:38 +0100
Chris Wilson <chris at chris-wilson.co.uk> wrote:

> On Mon, 2009-04-20 at 18:59 -0700, Eric Anholt wrote:
> > On Mon, 2009-04-20 at 18:53 -0700, Jesse Barnes wrote:
> > > On Mon, 20 Apr 2009 18:37:54 -0700
> > > Eric Anholt <eric at anholt.net> wrote:
> > > > Having had problems with the interrupt handler part of error
> > > > detection before, I'm pretty wary until we've triggered a *lot*
> > > > of errors with it. But I'd love to pull a patch that was just
> > > > the debugfs bits.
> > > 
> > > What did you run into?  An unceasing flood of error interrupts or
> > > something else?
> > > 
> > > FWIW it doesn't trigger in normal operation (at least
> > > not apparently on my 965).  I guess we could make it a module
> > > option or add a count if you want, but my eventual intent would
> > > be to catch the first real error and take some action on it.  But
> > > that will only work if the error detection is precise...
> > 
> > It was complete system lockup reports from people testing the
> > branch I'd done for error reporting.
> 
> Been there, done that, posted workaround. ;-)
> 
> This was the essential part to clear the persistent interrupt on my
> i915:
> 	I915_WRITE(EIR, eir);
> 	eir = I915_READ16(EIR);
> 	if (eir != 0) {
> 		DRM_INFO("Potential un-cleared error bits: 0x%04x, "
> 			 "disabling.\n",
> 			 eir);
> 
> 		I915_WRITE16(EMR, I915_READ(EMR) | eir);
> 
> 		/* Clear the Master Error bit as well, since the
> EIR != 0 */ I915_WRITE(IIR,
> I915_RENDER_COMMAND_PARSER_ERROR_INTERRUPT); }

So you added this to the post-ack part of the handler?  I guess it's
fine though in postinstall I think I mask everything but the EIR bits
we actually handle.  Also there was some ambiguity about clearing
error interrupt sources.  I think we're supposed to ack them like other
interrupts, but in IPEIR for example, a ring error might have no bits
set but still generate an error.  I'll check for updated docs here,
maybe we're just missing some other part of the ack protocol.

-- 
Jesse Barnes, Intel Open Source Technology Center



More information about the Intel-gfx mailing list