[Intel-gfx] [PATCH v2 1/3] drm/i915: Record the ringbuffer associated with the request
Chris Wilson
chris at chris-wilson.co.uk
Mon Dec 14 03:28:35 PST 2015
On Mon, Dec 14, 2015 at 11:14:31AM +0000, Dave Gordon wrote:
> On 11/12/15 22:59, Chris Wilson wrote:
> >The request tells us where to read the ringbuf from, so use that
> >information to simplify the error capture. If no request was active at
> >the time of the hang, the ring is idle and there is no information
> >inside the ring pertaining to the hang.
> >
> >Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >---
> > drivers/gpu/drm/i915/i915_gpu_error.c | 29 ++++++++++-------------------
> > 1 file changed, 10 insertions(+), 19 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> >index 3e137fc701cf..6eefe9c36931 100644
> >--- a/drivers/gpu/drm/i915/i915_gpu_error.c
> >+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> >@@ -995,7 +995,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
> >
> > for (i = 0; i < I915_NUM_RINGS; i++) {
> > struct intel_engine_cs *ring = &dev_priv->ring[i];
> >- struct intel_ringbuffer *rbuf;
> >+ struct intel_ringbuffer *rbuf = NULL;
> >
> > error->ring[i].pid = -1;
> >
> >@@ -1039,26 +1039,17 @@ static void i915_gem_record_rings(struct drm_device *dev,
> > }
> > rcu_read_unlock();
> > }
> >+
> >+ rbuf = request->ringbuf;
> > }
> >
> >- if (i915.enable_execlists) {
> >- /* TODO: This is only a small fix to keep basic error
> >- * capture working, but we need to add more information
> >- * for it to be useful (e.g. dump the context being
> >- * executed).
> >- */
> >- if (request)
> >- rbuf = request->ctx->engine[ring->id].ringbuf;
> >- else
> >- rbuf = ring->default_context->engine[ring->id].ringbuf;
> >- } else
> >- rbuf = ring->buffer;
> >-
> >- error->ring[i].cpu_ring_head = rbuf->head;
> >- error->ring[i].cpu_ring_tail = rbuf->tail;
> >-
> >- error->ring[i].ringbuffer =
> >- i915_error_ggtt_object_create(dev_priv, rbuf->obj);
> >+ if (rbuf) {
> >+ error->ring[i].cpu_ring_head = rbuf->head;
> >+ error->ring[i].cpu_ring_tail = rbuf->tail;
> >+ error->ring[i].ringbuffer =
> >+ i915_error_ggtt_object_create(dev_priv,
> >+ rbuf->obj);
> >+ }
> >
> > error->ring[i].hws_page =
> > i915_error_ggtt_object_create(dev_priv, ring->status_page.obj);
>
> I think the code you deleted is intended to capture the *default*
> ringbuffer if there is no request active -- sometimes we will switch
> an engine to the default context (and therefore ringbuffer) when
> there's no work to be done.
So answer the question, why? I don't have a use for it. This code in
particular is nothing more than a hack for execlists and in no way
reflects my intentions for the postmortem debugging tool.
> Another option that's sometimes useful is to capture the ringbuffer
> pointed to by the START register. This helps in finding situations
> where the driver and the GPU disagree about what should be in
> progress.
That is a possibitly, except is no more interesting than inspecting the
START vs expected (and requires the stop_machine rework to walk the
lists without crashing).
> I've got a few patches that update some of the error capture that's
> always been missing in execlist mode (like, actually capturing the
> active context), and add some more decoding of what we do capture.
No decoding. That is easier done in userspace. And I sent patches to do
more capturing many, many months ago, along with fixing up most of the
invalid ppgtt state.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list