[Intel-gfx] [PATCH] drm/i915: Fix erroneous dereference of batch_obj inside reset_status

Daniel Vetter daniel at ffwll.ch
Wed Dec 4 13:18:42 CET 2013


On Wed, Dec 04, 2013 at 11:37:09AM +0000, Chris Wilson wrote:
> As the rings may be processed and their requests deallocated in a
> different order to the natural retirement during a reset,
> 
> /* Whilst this request exists, batch_obj will be on the
>  * active_list, and so will hold the active reference. Only when this
>  * request is retired will the the batch_obj be moved onto the
>  * inactive_list and lose its active reference. Hence we do not need
>  * to explicitly hold another reference here.
>  */
> 
> is violated, and the batch_obj may be dereferenced after it had been
> freed on another ring. This can be simply avoided by processing the
> status update prior to deallocating any requests.
> 
> Fixes regression (a possible OOPS following a GPU hang) from
> commit aa60c664e6df502578454621c3a9b1f087ff8d25
> Author: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> Date:   Wed Jun 12 15:13:20 2013 +0300
> 
>     drm/i915: find guilty batch buffer on ring resets
> 
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> Cc: stable at vger.kernel.org
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 29 +++++++++++++++++++----------
>  1 file changed, 19 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ec4502034203..c1e481d36575 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2442,15 +2442,24 @@ static void i915_gem_free_request(struct drm_i915_gem_request *request)
>  	kfree(request);
>  }
>  
> -static void i915_gem_reset_ring_lists(struct drm_i915_private *dev_priv,
> -				      struct intel_ring_buffer *ring)
> +static void i915_gem_reset_ring_status(struct drm_i915_private *dev_priv,
> +				       struct intel_ring_buffer *ring)
>  {
> -	u32 completed_seqno;
> -	u32 acthd;
> +	u32 completed_seqno = ring->get_seqno(ring, false);
> +	u32 acthd = intel_ring_get_active_head(ring);
> +	struct drm_i915_gem_request *request;
> +
> +	list_for_each_entry(request, &ring->request_list, list) {
> +		if (i915_seqno_passed(completed_seqno, request->seqno))
> +			continue;
>  
> -	acthd = intel_ring_get_active_head(ring);
> -	completed_seqno = ring->get_seqno(ring, false);
> +		i915_set_reset_status(ring, request, acthd);
> +	}
> +}

Indeed the fix in the gem reset code is a bit simpler than what I've
feared. We still have fairly tricky code which depends upon that implicit
reference in non-obvious ways. So I still think Mika's refcount patch with
the comments updated is the better approach.
-Daniel


>  
> +static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
> +					struct intel_ring_buffer *ring)
> +{
>  	while (!list_empty(&ring->request_list)) {
>  		struct drm_i915_gem_request *request;
>  
> @@ -2458,9 +2467,6 @@ static void i915_gem_reset_ring_lists(struct drm_i915_private *dev_priv,
>  					   struct drm_i915_gem_request,
>  					   list);
>  
> -		if (request->seqno > completed_seqno)
> -			i915_set_reset_status(ring, request, acthd);
> -
>  		i915_gem_free_request(request);
>  	}
>  
> @@ -2503,7 +2509,10 @@ void i915_gem_reset(struct drm_device *dev)
>  	int i;
>  
>  	for_each_ring(ring, dev_priv, i)
> -		i915_gem_reset_ring_lists(dev_priv, ring);
> +		i915_gem_reset_ring_status(dev_priv, ring);
> +
> +	for_each_ring(ring, dev_priv, i)
> +		i915_gem_reset_ring_cleanup(dev_priv, ring);
>  
>  	i915_gem_cleanup_ringbuffer(dev);
>  
> -- 
> 1.8.5.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch



More information about the Intel-gfx mailing list