[Intel-gfx] [PATCH] i915: add error detection & state dumping
Jesse Barnes
jbarnes at virtuousgeek.org
Tue Apr 21 17:45:00 CEST 2009
On Tue, 21 Apr 2009 09:11:38 +0100
Chris Wilson <chris at chris-wilson.co.uk> wrote:
> On Mon, 2009-04-20 at 18:59 -0700, Eric Anholt wrote:
> > On Mon, 2009-04-20 at 18:53 -0700, Jesse Barnes wrote:
> > > On Mon, 20 Apr 2009 18:37:54 -0700
> > > Eric Anholt <eric at anholt.net> wrote:
> > > > Having had problems with the interrupt handler part of error
> > > > detection before, I'm pretty wary until we've triggered a *lot*
> > > > of errors with it. But I'd love to pull a patch that was just
> > > > the debugfs bits.
> > >
> > > What did you run into? An unceasing flood of error interrupts or
> > > something else?
> > >
> > > FWIW it doesn't trigger in normal operation (at least
> > > not apparently on my 965). I guess we could make it a module
> > > option or add a count if you want, but my eventual intent would
> > > be to catch the first real error and take some action on it. But
> > > that will only work if the error detection is precise...
> >
> > It was complete system lockup reports from people testing the
> > branch I'd done for error reporting.
>
> Been there, done that, posted workaround. ;-)
>
> This was the essential part to clear the persistent interrupt on my
> i915:
> I915_WRITE(EIR, eir);
> eir = I915_READ16(EIR);
> if (eir != 0) {
> DRM_INFO("Potential un-cleared error bits: 0x%04x, "
> "disabling.\n",
> eir);
>
> I915_WRITE16(EMR, I915_READ(EMR) | eir);
>
> /* Clear the Master Error bit as well, since the
> EIR != 0 */ I915_WRITE(IIR,
> I915_RENDER_COMMAND_PARSER_ERROR_INTERRUPT); }
So you added this to the post-ack part of the handler? I guess it's
fine though in postinstall I think I mask everything but the EIR bits
we actually handle. Also there was some ambiguity about clearing
error interrupt sources. I think we're supposed to ack them like other
interrupts, but in IPEIR for example, a ring error might have no bits
set but still generate an error. I'll check for updated docs here,
maybe we're just missing some other part of the ack protocol.
--
Jesse Barnes, Intel Open Source Technology Center
More information about the Intel-gfx
mailing list