[Intel-gfx] [PATCH] i915: add error detection & state dumping
eric at anholt.net
Tue Apr 21 03:59:33 CEST 2009
On Mon, 2009-04-20 at 18:53 -0700, Jesse Barnes wrote:
> On Mon, 20 Apr 2009 18:37:54 -0700
> Eric Anholt <eric at anholt.net> wrote:
> > On Mon, 2009-04-20 at 15:38 -0700, Jesse Barnes wrote:
> > > Add error state detection and state dumping to the i915 driver.
> > >
> > > This is still pretty rudimentary, since it just dumps error state at
> > > detect time or when the i915_error_state file from debugfs is
> > > read. To really figure things out it would be good to track the
> > > PGTBL_ER offset back to its originating batchbuffer and process,
> > > and save the batchbuffer for later fetch & decode by the dumping
> > > tool.
> > >
> > > We'd also like to be able to recover from errors by killing the
> > > offending process and/or resetting the chip as needed.
> > >
> > > I've tested this on 965, but re-reviewed the offets for pre-965
> > > (good thing I did, all the debug registers moved), so it should
> > > work on earlier chips too. However I've only successfully dumped
> > > instruction parse errors, it would be good to get additional
> > > testing for pipe underruns & page table errors.
> > Having had problems with the interrupt handler part of error detection
> > before, I'm pretty wary until we've triggered a *lot* of errors with
> > it. But I'd love to pull a patch that was just the debugfs bits.
> What did you run into? An unceasing flood of error interrupts or
> something else?
> FWIW it doesn't trigger in normal operation (at least
> not apparently on my 965). I guess we could make it a module option or
> add a count if you want, but my eventual intent would be to catch the
> first real error and take some action on it. But that will only work
> if the error detection is precise...
It was complete system lockup reports from people testing the branch I'd
done for error reporting.
> Either way, feel free to trim the i915_irq part of it if you want,
> since the debugfs part of it is useful by itself.
Will do. I'd like to see the IRQ bits once they've sat for a while, but
getting debug info into the kernel means we can parse it and make
intel_gpu_dump better, which I want ASAP.
eric at anholt.net eric.anholt at intel.com
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 197 bytes
Desc: This is a digitally signed message part
More information about the Intel-gfx