[Intel-gfx] [PATCH] drm/i915: Stop gathering error states for CS error interrupts

Chris Wilson chris at chris-wilson.co.uk
Wed Nov 5 09:35:01 CET 2014


On Tue, Nov 04, 2014 at 03:52:22PM +0100, Daniel Vetter wrote:
> There's quite a few bug reports with error states where the error
> reasons makes just about no sense at all. Like dying on tlbs for a
> display plane that's not even there. Also users don't really report a
> lot of bad side effects generally, just the error states.
> 
> Furthermore we don't even enable these interrupts any more on gen5+
> (though the handling code is still there). So this mostly concerns old
> platforms.
> 
> Given all that lets make our lives a bit easier and stop capturing
> error states, in the hopes that we can just ignore them. In case
> that's not true and the gpu indeed dies the hangcheck should
> eventually kick in. And I've left some debug log in to make this case
> noticeble. Referenced bug is just an example.

The problem is they can be useful. They have shown when our modesetting
sequence has been completely snafu, and they can also be used to detect
page faults (but that does require a bit of kernel trickery) in
userspace GPU command streams. Even in the Display B on 845g, we must
have done something to upset the hardware, but we simply haven't
captured what. I am not yet convinced we want to throw all such reports
away, in case we do ignore genuine fail.

How about just toning down the error message for non-fatal faults, and
discarding the earlier error state should we get a fatal fault afterwards?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre



More information about the Intel-gfx mailing list