[Intel-gfx] [PATCH] i915: add error detection & state dumping

Eric Anholt eric at anholt.net
Tue Apr 21 03:59:33 CEST 2009


On Mon, 2009-04-20 at 18:53 -0700, Jesse Barnes wrote:
> On Mon, 20 Apr 2009 18:37:54 -0700
> Eric Anholt <eric at anholt.net> wrote:
> 
> > On Mon, 2009-04-20 at 15:38 -0700, Jesse Barnes wrote:
> > > Add error state detection and state dumping to the i915 driver.
> > > 
> > > This is still pretty rudimentary, since it just dumps error state at
> > > detect time or when the i915_error_state file from debugfs is
> > > read.  To really figure things out it would be good to track the
> > > PGTBL_ER offset back to its originating batchbuffer and process,
> > > and save the batchbuffer for later fetch & decode by the dumping
> > > tool.
> > > 
> > > We'd also like to be able to recover from errors by killing the
> > > offending process and/or resetting the chip as needed.
> > > 
> > > I've tested this on 965, but re-reviewed the offets for pre-965
> > > (good thing I did, all the debug registers moved), so it should
> > > work on earlier chips too.  However I've only successfully dumped
> > > instruction parse errors, it would be good to get additional
> > > testing for pipe underruns & page table errors.
> > 
> > Having had problems with the interrupt handler part of error detection
> > before, I'm pretty wary until we've triggered a *lot* of errors with
> > it. But I'd love to pull a patch that was just the debugfs bits.
> 
> What did you run into?  An unceasing flood of error interrupts or
> something else?
> 
> FWIW it doesn't trigger in normal operation (at least
> not apparently on my 965).  I guess we could make it a module option or
> add a count if you want, but my eventual intent would be to catch the
> first real error and take some action on it.  But that will only work
> if the error detection is precise...

It was complete system lockup reports from people testing the branch I'd
done for error reporting.

> Either way, feel free to trim the i915_irq part of it if you want,
> since the debugfs part of it is useful by itself.

Will do.  I'd like to see the IRQ bits once they've sat for a while, but
getting debug info into the kernel means we can parse it and make
intel_gpu_dump better, which I want ASAP.

-- 
Eric Anholt
eric at anholt.net                         eric.anholt at intel.com


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20090420/f6186312/attachment.sig>


More information about the Intel-gfx mailing list