[Intel-gfx] [PATCH] drm/i915: Advance seqno upon reseting the GPU following a hang

Wed May 8 16:02:00 CEST 2013

On Wed, May 08, 2013 at 02:29:30PM +0100, Chris Wilson wrote:
> There is an unlikely corner case whereby a lockless wait may not notice
> a GPU hang and reset, and so continue to wait for the device to advance
> beyond the chosen seqno. This of course may never happen as the waiter
> may be the only user. Instead, we can explicitly advance the device
> seqno to match the requests that are forcibly retired following the
> hang.
> 
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>

This race is why the reset counter must always increase and can't just
flip-flop between the reset-in-progress and everything-works states.

Now if we want to unwedge on resume we need to reconsider this, but imo it
would be easier to simply remember the reset counter before we wedge the
gpu and restore that one (incremented as if the gpu reset worked). We
already assume that wedged will never collide with a real reset counter,
so this should work.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem.c |   15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 84ee1f2..b3c8abd 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2118,8 +2118,11 @@ static void i915_gem_free_request(struct drm_i915_gem_request *request)
>  }
>  
>  static void i915_gem_reset_ring_lists(struct drm_i915_private *dev_priv,
> -				      struct intel_ring_buffer *ring)
> +				      struct intel_ring_buffer *ring,
> +				      u32 seqno)
>  {
> +	int i;
> +
>  	while (!list_empty(&ring->request_list)) {
>  		struct drm_i915_gem_request *request;
>  
> @@ -2139,6 +2142,10 @@ static void i915_gem_reset_ring_lists(struct drm_i915_private *dev_priv,
>  
>  		i915_gem_object_move_to_inactive(obj);
>  	}
> +
> +	intel_ring_init_seqno(ring, seqno);
> +	for (i = 0; i < ARRAY_SIZE(ring->sync_seqno); i++)
> +		ring->sync_seqno[i] = 0;
>  }
>  
>  static void i915_gem_reset_fences(struct drm_device *dev)
> @@ -2167,10 +2174,14 @@ void i915_gem_reset(struct drm_device *dev)
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct drm_i915_gem_object *obj;
>  	struct intel_ring_buffer *ring;
> +	u32 seqno;
>  	int i;
>  
> +	if (i915_gem_get_seqno(dev, &seqno))
> +		seqno = dev_priv->next_seqno - 1;
> +
>  	for_each_ring(ring, dev_priv, i)
> -		i915_gem_reset_ring_lists(dev_priv, ring);
> +		i915_gem_reset_ring_lists(dev_priv, ring, seqno);
>  
>  	/* Move everything out of the GPU domains to ensure we do any
>  	 * necessary invalidation upon reuse.
> -- 
> 1.7.10.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch