[Intel-gfx] [PATCH v2] tests/gem_error_capture: Initial testcase for error state capture/dump

Chris Wilson chris at chris-wilson.co.uk
Mon Apr 14 12:30:57 CEST 2014


On Mon, Apr 14, 2014 at 10:23:20AM +0000, Mateo Lozano, Oscar wrote:
> > > > > +				&requests);
> > > > > +		if (req_matched == 1) {
> > > > > +			igt_assert(strstr(ring_name, expected_ring_name));
> > > > > +			igt_assert(requests == 1);
> > > >
> > > > Bad assumption. You could still have the request from
> > > > gem_quiescent_gpu() which may not have been retired before the error
> > > > triggers.
> > >
> > > Hmmmm... but I make a first valid gem_execbuf()+gem_sync() before the
> > one that actually hangs. Shouldn´t that make sure that all previous requests
> > have been retired?
> > 
> > No, I would not make that assumption about kernel behaviour. The only thing
> > that it will guarantee is that the last request with that handle is retired. (In
> > other words, forthcoming bug fixes already break this
> > assumption.)
> 
> Ok, then I´ll check all the reported requests (can I at least assume that the one I am interested in will be the last one? request_list seems to be FIFO...).

In general, even that may be dangerous. (But less so if can be sure that
is no batchbuffer injection from the testsuite, and that there are no
other clients, or that the kernel doesn't start having to do its own
w/a.)

> > > > > +
> > > > > +			igt_assert(getline(&line, &line_size, file) > 0);
> > > > > +			items = sscanf(line, "  seqno 0x%08x, emitted %ld, tail
> > > > 0x%08x\n",
> > > > > +					&seqno, &jiffies, &tail);
> > > > > +			igt_assert(items == 3);
> > > >
> > > > Bad. I may change the format. s/may/will/
> > >
> > > I still need to make sure I got a valid tail, so I need to know the format and
> > fail if don´t understand it. Or do you want me to use a different tail, maybe the
> > ringbuffer one?
> > 
> > I didn't spot where you used tail. Care to elaborate? I may be able to suggest an
> > alternative, or we may either code this more defensively or make the kernel
> > emission easier to use and hopefully more informative for debugging.
> 
> I´m using the request tail to traverse the ring buffer, make sure there is a MI_BB_START and then our object´s address shortly before that tail. I could use ring->tail, or I could skip this check altogether (after all, I´m already checking that the GPU was executing a Batch Buffer, and that the BB had my magic number on it).

I would add a little more smarts to both the kernel and error-decode.
In the kernel, we can print the guilty request, which you can then use
to confirm that it is yours. That seems to me to be a stronger
validation of gem_error_capture, and a useful bit of information from
hangstats that we do not expose currently.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre



More information about the Intel-gfx mailing list