[Intel-gfx] [PATCH v2] tests/gem_error_capture: Initial testcase for error state capture/dump

Daniel Vetter daniel at ffwll.ch
Tue Apr 15 21:38:27 CEST 2014


On Mon, Apr 14, 2014 at 01:03:58PM +0000, Mateo Lozano, Oscar wrote:
> > I would add a little more smarts to both the kernel and error-decode.
> > In the kernel, we can print the guilty request, which you can then use to
> > confirm that it is yours. That seems to me to be a stronger validation of
> > gem_error_capture, and a useful bit of information from hangstats that we do
> > not expose currently.
> 
> That sounds good. I have to add a number of other things to
> i915_gpu_error as part of the Execlists code, so I´ll add a "--- guilty
> request" as well and resubmit this test together with the series.

If we want this much smarts then we need a properly hanging batch, e.g.
like the looping batch used in gem_reset_stats.

The problem with that is that this will kill the gpu if reset doesn't work
(i.e. gen2/3) so we need to skip this test there. Or maybe split things
into 2 subtests and use the properly hanging batch only when we do the
extended guilty testing under discussion here.

But in any case just checking that the batch is somewhere in the ring
(properly masking of lower bits 0-11 ofc) and checking whether the batch
is correctl dumped (with the magic value) would catch a lot of the
past&present execbuf bugs - we've had issues with dumping fancy values of
0 a lot.

For the guilty stuff we have an extensive set of tests in gem_reset_stat
using the reset stat ioctl already. And for the occasional "the hang
detection logic is busted bug" I think nothing short of a human brain
locking at the batch really helps. At least if we want to be somewhat
platform agnostic ...

So imo the current level of checking loosk Good Enough. But I'm certainly
not going to stop you ;-)

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch



More information about the Intel-gfx mailing list