[Intel-gfx] [PATCH] drm/i915: Stop gathering error states for CS error interrupts

Daniel Vetter daniel at ffwll.ch
Mon Nov 24 21:57:32 CET 2014


On Wed, Nov 05, 2014 at 10:56:06AM +0100, Daniel Vetter wrote:
> On Wed, Nov 05, 2014 at 08:35:01AM +0000, Chris Wilson wrote:
> > On Tue, Nov 04, 2014 at 03:52:22PM +0100, Daniel Vetter wrote:
> > > There's quite a few bug reports with error states where the error
> > > reasons makes just about no sense at all. Like dying on tlbs for a
> > > display plane that's not even there. Also users don't really report a
> > > lot of bad side effects generally, just the error states.
> > > 
> > > Furthermore we don't even enable these interrupts any more on gen5+
> > > (though the handling code is still there). So this mostly concerns old
> > > platforms.
> > > 
> > > Given all that lets make our lives a bit easier and stop capturing
> > > error states, in the hopes that we can just ignore them. In case
> > > that's not true and the gpu indeed dies the hangcheck should
> > > eventually kick in. And I've left some debug log in to make this case
> > > noticeble. Referenced bug is just an example.
> > 
> > The problem is they can be useful. They have shown when our modesetting
> > sequence has been completely snafu, and they can also be used to detect
> > page faults (but that does require a bit of kernel trickery) in
> > userspace GPU command streams. Even in the Display B on 845g, we must
> > have done something to upset the hardware, but we simply haven't
> > captured what. I am not yet convinced we want to throw all such reports
> > away, in case we do ignore genuine fail.
> > 
> > How about just toning down the error message for non-fatal faults, and
> > discarding the earlier error state should we get a fatal fault afterwards?
> 
> Hm yeah, that might work too.

I looked at this and it gets ugly fast. Given that we seem to have a quite
substantial false-positive (found one more by just reading recent bug
spam) rate and haven't enabled this on gen5+ I've decided to just merge
this one here. With the missing \n added ofc.

We can still inject manual captures using debugfs, and wiring this up
again if it indeed proves useful should be quit.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch



More information about the Intel-gfx mailing list