[Intel-gfx] [PATCH] drm/i915: Decouple GPU error reporting from ring initialisation

Chris Wilson chris at chris-wilson.co.uk
Fri Jan 24 12:55:21 CET 2014


On Fri, Jan 24, 2014 at 01:50:25PM +0200, Ville Syrjälä wrote:
> On Thu, Jan 23, 2014 at 09:49:43PM +0000, Chris Wilson wrote:
> > Currently we report through our error state only the rings that have
> > been initialised (as detected by ring->obj). This check is done after
> > the GPU reset and ring re-initialisation, which means that the software
> > state may not be the same as when we captured the hardware error and we
> > may not print out any of the vital information for debugging the hang.
> > 
> > This (and the implied object leak) is a regression from
> > 
> > commit 3d57e5bd1284f44e325f3a52d966259ed42f9e05
> > Author: Ben Widawsky <ben at bwidawsk.net>
> > Date:   Mon Oct 14 10:01:36 2013 -0700
> > 
> >     drm/i915: Do a fuller init after reset
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Ben Widawsky <ben at bwidawsk.net>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h       |  1 +
> >  drivers/gpu/drm/i915/i915_gpu_error.c | 19 +++++++++++++------
> >  2 files changed, 14 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index c45cbbecd66a..64a1aca7804d 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -334,6 +334,7 @@ struct drm_i915_error_state {
> >  	struct timeval time;
> >  
> >  	struct drm_i915_error_ring {
> > +		int valid;
> 
> bool

in a struct? I tend to think it leads to laziness not to coalesce them
into bitfields.

> > -		obj = error->ring[i].ctx;
> > -		if (obj) {
> > +		if ((obj = error->ring[i].ctx)) {
> 
> Unrelated change. Although it does make this more consistent w/ the
> surrouding code. But I admit to not being a fan of assignments inside
> if statements.

The inconsistency was uglier.

> >  			err_printf(m, "%s --- HW Context = 0x%08x\n",
> >  				   dev_priv->ring[i].name,
> >  				   obj->gtt_offset);
> > @@ -826,11 +827,17 @@ static void i915_gem_record_rings(struct drm_device *dev,
> >  				  struct drm_i915_error_state *error)
> >  {
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> > -	struct intel_ring_buffer *ring;
> >  	struct drm_i915_gem_request *request;
> >  	int i, count;
> >  
> > -	for_each_ring(ring, dev_priv, i) {
> > +	for (i = 0; i < I915_NUM_RINGS; i++) {
> > +		struct intel_ring_buffer *ring = &dev_priv->ring[i];
> > +
> > +		if (ring->dev == NULL)
> > +			continue;
> > +
> > +		error->ring[i].valid = true;
> > +
> 
> The code here runs before the reset, and it would actually oops if
> ring->obj==NULL, so using for_each_ring() here looks appropriate.

No, we need to record that ring->obj is NULL, especially if the ring
registers are still set...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre



More information about the Intel-gfx mailing list