[Intel-gfx] [PATCH 1/3] drm/i915: Rely on accurate request tracking for finding hung batches

Wed Feb 12 09:25:58 CET 2014

On Tue, Feb 11, 2014 at 05:57:19PM -0800, Ben Widawsky wrote:
> On Mon, Feb 10, 2014 at 04:30:49PM +0200, Mika Kuoppala wrote:
> > -static struct drm_i915_gem_request *
> > -i915_gem_find_first_non_complete(struct intel_ring_buffer *ring)
> > +struct drm_i915_gem_request *
> > +i915_gem_find_active_request(struct intel_ring_buffer *ring)
> >  {
> >  	struct drm_i915_gem_request *request;
> > -	const u32 completed_seqno = ring->get_seqno(ring, false);
> > +	u32 completed_seqno;
> > +
> > +	if (WARN_ON(!ring->get_seqno))
> > +		return NULL;
> 
> ring->get_seqno(ring, false) ?

?

This was originally used in the error capture code to detect
uninitialised rings.

> This looks like it's currently wrong below as well.

?

> I still would have preferred to keep both methods for a while until we
> felt really comfortable with this... the commit message's statement that
> getting bad error state is inevitably a good thing is total bunk, I
> think.

I strongly disagree. One of the critical factors you first scan for when
reading error states is whether it is self-consistent. If the error
state is inconsistent, we also know that some of our critical
infrastructure is wrong - which we can't know if the error capture code
was different. And in the cases where the error state is inconsistent,
the scanning method would also be snafu (because ACTHD is no longer
within the expected bounds).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre