[Intel-gfx] [PATCH v2 1/3] drm/i915: Record the ringbuffer associated with the request

Mon Dec 14 03:28:35 PST 2015

On Mon, Dec 14, 2015 at 11:14:31AM +0000, Dave Gordon wrote:
> On 11/12/15 22:59, Chris Wilson wrote:
> >The request tells us where to read the ringbuf from, so use that
> >information to simplify the error capture. If no request was active at
> >the time of the hang, the ring is idle and there is no information
> >inside the ring pertaining to the hang.
> >
> >Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >---
> >  drivers/gpu/drm/i915/i915_gpu_error.c | 29 ++++++++++-------------------
> >  1 file changed, 10 insertions(+), 19 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> >index 3e137fc701cf..6eefe9c36931 100644
> >--- a/drivers/gpu/drm/i915/i915_gpu_error.c
> >+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> >@@ -995,7 +995,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
> >
> >  	for (i = 0; i < I915_NUM_RINGS; i++) {
> >  		struct intel_engine_cs *ring = &dev_priv->ring[i];
> >-		struct intel_ringbuffer *rbuf;
> >+		struct intel_ringbuffer *rbuf = NULL;
> >
> >  		error->ring[i].pid = -1;
> >
> >@@ -1039,26 +1039,17 @@ static void i915_gem_record_rings(struct drm_device *dev,
> >  				}
> >  				rcu_read_unlock();
> >  			}
> >+
> >+			rbuf = request->ringbuf;
> >  		}
> >
> >-		if (i915.enable_execlists) {
> >-			/* TODO: This is only a small fix to keep basic error
> >-			 * capture working, but we need to add more information
> >-			 * for it to be useful (e.g. dump the context being
> >-			 * executed).
> >-			 */
> >-			if (request)
> >-				rbuf = request->ctx->engine[ring->id].ringbuf;
> >-			else
> >-				rbuf = ring->default_context->engine[ring->id].ringbuf;
> >-		} else
> >-			rbuf = ring->buffer;
> >-
> >-		error->ring[i].cpu_ring_head = rbuf->head;
> >-		error->ring[i].cpu_ring_tail = rbuf->tail;
> >-
> >-		error->ring[i].ringbuffer =
> >-			i915_error_ggtt_object_create(dev_priv, rbuf->obj);
> >+		if (rbuf) {
> >+			error->ring[i].cpu_ring_head = rbuf->head;
> >+			error->ring[i].cpu_ring_tail = rbuf->tail;
> >+			error->ring[i].ringbuffer =
> >+				i915_error_ggtt_object_create(dev_priv,
> >+							      rbuf->obj);
> >+		}
> >
> >  		error->ring[i].hws_page =
> >  			i915_error_ggtt_object_create(dev_priv, ring->status_page.obj);
> 
> I think the code you deleted is intended to capture the *default*
> ringbuffer if there is no request active -- sometimes we will switch
> an engine to the default context (and therefore ringbuffer) when
> there's no work to be done.

So answer the question, why? I don't have a use for it. This code in
particular is nothing more than a hack for execlists and in no way
reflects my intentions for the postmortem debugging tool.

> Another option that's sometimes useful is to capture the ringbuffer
> pointed to by the START register. This helps in finding situations
> where the driver and the GPU disagree about what should be in
> progress.

That is a possibitly, except is no more interesting than inspecting the
START vs expected (and requires the stop_machine rework to walk the
lists without crashing).

> I've got a few patches that update some of the error capture that's
> always been missing in execlist mode (like, actually capturing the
> active context), and add some more decoding of what we do capture.

No decoding. That is easier done in userspace. And I sent patches to do
more capturing many, many months ago, along with fixing up most of the
invalid ppgtt state.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre