[Intel-gfx] [PATCH v2 1/3] drm/i915: Record the ringbuffer associated with the request

Dave Gordon david.s.gordon at intel.com
Mon Dec 14 03:14:31 PST 2015


On 11/12/15 22:59, Chris Wilson wrote:
> The request tells us where to read the ringbuf from, so use that
> information to simplify the error capture. If no request was active at
> the time of the hang, the ring is idle and there is no information
> inside the ring pertaining to the hang.
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gpu_error.c | 29 ++++++++++-------------------
>   1 file changed, 10 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 3e137fc701cf..6eefe9c36931 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -995,7 +995,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
>
>   	for (i = 0; i < I915_NUM_RINGS; i++) {
>   		struct intel_engine_cs *ring = &dev_priv->ring[i];
> -		struct intel_ringbuffer *rbuf;
> +		struct intel_ringbuffer *rbuf = NULL;
>
>   		error->ring[i].pid = -1;
>
> @@ -1039,26 +1039,17 @@ static void i915_gem_record_rings(struct drm_device *dev,
>   				}
>   				rcu_read_unlock();
>   			}
> +
> +			rbuf = request->ringbuf;
>   		}
>
> -		if (i915.enable_execlists) {
> -			/* TODO: This is only a small fix to keep basic error
> -			 * capture working, but we need to add more information
> -			 * for it to be useful (e.g. dump the context being
> -			 * executed).
> -			 */
> -			if (request)
> -				rbuf = request->ctx->engine[ring->id].ringbuf;
> -			else
> -				rbuf = ring->default_context->engine[ring->id].ringbuf;
> -		} else
> -			rbuf = ring->buffer;
> -
> -		error->ring[i].cpu_ring_head = rbuf->head;
> -		error->ring[i].cpu_ring_tail = rbuf->tail;
> -
> -		error->ring[i].ringbuffer =
> -			i915_error_ggtt_object_create(dev_priv, rbuf->obj);
> +		if (rbuf) {
> +			error->ring[i].cpu_ring_head = rbuf->head;
> +			error->ring[i].cpu_ring_tail = rbuf->tail;
> +			error->ring[i].ringbuffer =
> +				i915_error_ggtt_object_create(dev_priv,
> +							      rbuf->obj);
> +		}
>
>   		error->ring[i].hws_page =
>   			i915_error_ggtt_object_create(dev_priv, ring->status_page.obj);

I think the code you deleted is intended to capture the *default* 
ringbuffer if there is no request active -- sometimes we will switch an 
engine to the default context (and therefore ringbuffer) when there's no 
work to be done.

Another option that's sometimes useful is to capture the ringbuffer 
pointed to by the START register. This helps in finding situations where 
the driver and the GPU disagree about what should be in progress.

I've got a few patches that update some of the error capture that's 
always been missing in execlist mode (like, actually capturing the 
active context), and add some more decoding of what we do capture.
John Harrison posted them as part of the "Preemption support for GPU 
scheduler" patchset last week, although they're not really anything to 
do with preemption per se.

One of them "drm/i915/error: improve CSB reporting" also updates this 
area of the code, so maybe I should incorporate your change into the 
next revision of that patch?

.Dave.


More information about the Intel-gfx mailing list