[Intel-gfx] [PATCH] drm/i915: Fix erroneous dereference of batch_obj inside reset_status
Daniel Vetter
daniel at ffwll.ch
Wed Dec 4 13:18:42 CET 2013
On Wed, Dec 04, 2013 at 11:37:09AM +0000, Chris Wilson wrote:
> As the rings may be processed and their requests deallocated in a
> different order to the natural retirement during a reset,
>
> /* Whilst this request exists, batch_obj will be on the
> * active_list, and so will hold the active reference. Only when this
> * request is retired will the the batch_obj be moved onto the
> * inactive_list and lose its active reference. Hence we do not need
> * to explicitly hold another reference here.
> */
>
> is violated, and the batch_obj may be dereferenced after it had been
> freed on another ring. This can be simply avoided by processing the
> status update prior to deallocating any requests.
>
> Fixes regression (a possible OOPS following a GPU hang) from
> commit aa60c664e6df502578454621c3a9b1f087ff8d25
> Author: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> Date: Wed Jun 12 15:13:20 2013 +0300
>
> drm/i915: find guilty batch buffer on ring resets
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> Cc: stable at vger.kernel.org
> ---
> drivers/gpu/drm/i915/i915_gem.c | 29 +++++++++++++++++++----------
> 1 file changed, 19 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ec4502034203..c1e481d36575 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2442,15 +2442,24 @@ static void i915_gem_free_request(struct drm_i915_gem_request *request)
> kfree(request);
> }
>
> -static void i915_gem_reset_ring_lists(struct drm_i915_private *dev_priv,
> - struct intel_ring_buffer *ring)
> +static void i915_gem_reset_ring_status(struct drm_i915_private *dev_priv,
> + struct intel_ring_buffer *ring)
> {
> - u32 completed_seqno;
> - u32 acthd;
> + u32 completed_seqno = ring->get_seqno(ring, false);
> + u32 acthd = intel_ring_get_active_head(ring);
> + struct drm_i915_gem_request *request;
> +
> + list_for_each_entry(request, &ring->request_list, list) {
> + if (i915_seqno_passed(completed_seqno, request->seqno))
> + continue;
>
> - acthd = intel_ring_get_active_head(ring);
> - completed_seqno = ring->get_seqno(ring, false);
> + i915_set_reset_status(ring, request, acthd);
> + }
> +}
Indeed the fix in the gem reset code is a bit simpler than what I've
feared. We still have fairly tricky code which depends upon that implicit
reference in non-obvious ways. So I still think Mika's refcount patch with
the comments updated is the better approach.
-Daniel
>
> +static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
> + struct intel_ring_buffer *ring)
> +{
> while (!list_empty(&ring->request_list)) {
> struct drm_i915_gem_request *request;
>
> @@ -2458,9 +2467,6 @@ static void i915_gem_reset_ring_lists(struct drm_i915_private *dev_priv,
> struct drm_i915_gem_request,
> list);
>
> - if (request->seqno > completed_seqno)
> - i915_set_reset_status(ring, request, acthd);
> -
> i915_gem_free_request(request);
> }
>
> @@ -2503,7 +2509,10 @@ void i915_gem_reset(struct drm_device *dev)
> int i;
>
> for_each_ring(ring, dev_priv, i)
> - i915_gem_reset_ring_lists(dev_priv, ring);
> + i915_gem_reset_ring_status(dev_priv, ring);
> +
> + for_each_ring(ring, dev_priv, i)
> + i915_gem_reset_ring_cleanup(dev_priv, ring);
>
> i915_gem_cleanup_ringbuffer(dev);
>
> --
> 1.8.5.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
More information about the Intel-gfx
mailing list