[Intel-gfx] [RFC] drm/i915: reference count batch object on requests
Daniel Vetter
daniel at ffwll.ch
Tue Dec 3 18:10:05 CET 2013
On Mon, Dec 02, 2013 at 05:31:53PM +0200, Mika Kuoppala wrote:
> We used to lean on active_list to handle the references
> to batch objects. But there are useful cases when same,
> albeit simple, batch can be executing on multiple rings
> concurrently. For this case the active_list reference count
> handling is just not enough as batch could be freed by
> ring A request retirement as it is still running on ring B.
>
> Fix this by doing proper batch_obj reference counting.
>
> Signed-off-by: Mika Kuoppala <mika.kuoppala at intel.com>
>
> Notes:
> This is a patch which ameliorates the
> [PATCH] tests/gem_reset_stats: add close-pending-fork
>
> Chris wasn't happy about the refcounting as it might hide
> the true problem. But I haven't been able to find the real culprit,
> thus the RFC.
I think I understand the bug now, and your patch looks like the correct
fix. But we need to pimp the commit message.
In i915_gem_reset_ring_lists we reset requests and move objects to the
inactive list. Which means if the active list is the last one to hold a
reference, the object will disappear.
Now the problem is that we do this per-ring, and not in the order that the
objects would have been retired if the gpu wouldn't have hung. E.g. if a
batch is active on both ring 1&2 but was last active on ring 1, then we'd
free it before we go ahead with cleaning up the requests for ring 2.
So reference-counting the batch_obj pointers looks like the real fix.
Can you please amend your patch with this explanation and also please add
a backtrace that shows the crash? Might need to gather it with netconsole.
Thanks, Daniel
> ---
> drivers/gpu/drm/i915/i915_gem.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 40d9dcf..858538f 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2145,13 +2145,12 @@ int __i915_add_request(struct intel_ring_buffer *ring,
> request->head = request_start;
> request->tail = request_ring_position;
>
> - /* Whilst this request exists, batch_obj will be on the
> - * active_list, and so will hold the active reference. Only when this
> - * request is retired will the the batch_obj be moved onto the
> - * inactive_list and lose its active reference. Hence we do not need
> - * to explicitly hold another reference here.
> + /* Active list has one reference but that is not enough as same
> + * batch_obj can be active on multiple rings
> */
> request->batch_obj = obj;
> + if (request->batch_obj)
> + drm_gem_object_reference(&request->batch_obj->base);
>
> /* Hold a reference to the current context so that we can inspect
> * it later in case a hangcheck error event fires.
> @@ -2340,6 +2339,9 @@ static void i915_gem_free_request(struct drm_i915_gem_request *request)
> if (request->ctx)
> i915_gem_context_unreference(request->ctx);
>
> + if (request->batch_obj)
> + drm_gem_object_unreference(&request->batch_obj->base);
> +
> kfree(request);
> }
>
> --
> 1.7.9.5
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
More information about the Intel-gfx
mailing list