[Intel-gfx] [PATCH] drm/i915: kicking rings considered harmful

Chris Wilson chris at chris-wilson.co.uk
Tue Sep 27 23:54:01 CEST 2011


On Tue, 27 Sep 2011 12:38:59 -0700, Ben Widawsky <ben at bwidawsk.net> wrote:
> If we do this we lose the possibility to kick rings, but not reset the
> GPU (not that I find that terribly useful. If we do this, it does fire a
> wq event, but I don't see a problem with that for this case.
> 
> I think I would rather do this:
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 012732b..803524e 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1698,6 +1698,10 @@ void i915_hangcheck_elapsed(unsigned long data)
>                 if (dev_priv->hangcheck_count++ > 1) {
>                         DRM_ERROR("Hangcheck timer elapsed... GPU hung\n");
>  
> +                       /* Save off error state before kicking the rings and
> +                        * possibly ruining the GPU state.
> +                        */
> +                       i915_handle_error(dev, true);
>                         if (!IS_GEN2(dev)) {
>                                 /* Is the chip hanging on a WAIT_FOR_EVENT?
>                                  * If so we can simply poke the RB_WAIT bit
> @@ -1717,7 +1721,6 @@ void i915_hangcheck_elapsed(unsigned long data)
>                                         goto repeat;
>                         }
>  
> -                       i915_handle_error(dev, true);
>                         return;
>                 }
>         } else {

Interesting, if we simply call i915_capture_error_state() rather than move
the i195_handle_error() earlier we do in fact get the best of both worlds.

However, it doesn't address Daniel's statement that kick_rings() provoked
an unrecoverable hang and so we still need to disable that in order to
save the error-state. The origin of ring-kicking was to try and recover
from the modesetting/vsync issues, which apart from the outstanding issue
in intel_crtc_disable() are behind us. (I hope ;-) We shouldn't be relying
on i915_reset() and i915.reset=0 tends to be either deliberate or an act of
desparation so I don't see the issue in also preventing ring-kicking with
the same parameter. Is there an issue I'm overlooking?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre



More information about the Intel-gfx mailing list