[Intel-gfx] [PATCH 2/2] drm/i915: Capture current context on error

Chris Wilson chris at chris-wilson.co.uk
Sun Feb 24 10:34:30 CET 2013


On Sat, Feb 23, 2013 at 05:30:10PM -0800, Ben Widawsky wrote:
> On error, this represents the state of the currently running context at
> the time it was loaded.
> 
> Unfortunately, since we're hung and can't switch out the context this
> may not tell us too much about the most current state of the context,
> but does give clues about what has happened since loading.
> 
> Thanks to recent doc updates, we have a little more confidence regarding
> what is actually in this memory, and perhaps it will help us gain more
> insight into certain bugs. AFAICT, the most interesting info is in the
> first page. To save space, we only capture the first page. In the
> future, we might want to dump more.
> 
> Sample of the relevant part of error state:
> --- HW Context = 0x01b20000
> 00000000 :  00000000 1100105f 00002028 ffff0880
> 00000010 :  0000209c feff4040 000020c0 efdf0080
> 00000020 :  00002178 00000001 0000217c 00145855
> 00000030 :  00002310 00000000 00002314 00000000
> 00000040 :  00002318 00000000 0000231c 00000000

Presentation looks reasonable, except it will confuse
intel_error_decode as it will match "%x : %x". How about
"[%03x] %08x %08x %08x %08x"?

> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=55845
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Signed-off-by: Ben Widawsky <ben at bwidawsk.net>
> ---

> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index e95337c..ab88620 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -209,6 +209,7 @@ struct drm_i915_error_state {
>  	u32 pgtbl_er;
>  	u32 ier;
>  	u32 ccid;
> +	struct drm_i915_error_object *ctx_obj;

Put it next to the other pointers; lest we want to start digging holes.

>  	u32 derrmr;
>  	u32 forcewake;
>  	bool waiting[I915_NUM_RINGS];
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index ebaf558..7f7d241 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1321,6 +1321,14 @@ static void i915_capture_error_state(struct drm_device *dev)
>  	error->pgtbl_er = I915_READ(PGTBL_ER);
>  	error->ccid = I915_READ(CCID);
>  
> +	if (error->ccid && !dev_priv->hw_contexts_disabled) {
> +		list_for_each_entry(obj, &dev_priv->mm.active_list, mm_list)

I am doubtful that the active list will hold the object in all cases, as
we only put the context obj onto the active list when switching away.
I'd check the gtt_list to be on the safe side. And ignore what we think
of hw_context_disabled - if the CCID randomly points to one of our
objects, lets attach it.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre



More information about the Intel-gfx mailing list