[Intel-gfx] [RFC] drm/i915: reference count batch object on requests

Chris Wilson chris at chris-wilson.co.uk
Wed Dec 4 12:24:02 CET 2013


On Tue, Dec 03, 2013 at 06:10:05PM +0100, Daniel Vetter wrote:
> On Mon, Dec 02, 2013 at 05:31:53PM +0200, Mika Kuoppala wrote:
> > We used to lean on active_list to handle the references
> > to batch objects. But there are useful cases when same,
> > albeit simple, batch can be executing on multiple rings
> > concurrently. For this case the active_list reference count
> > handling is just not enough as batch could be freed by
> > ring A request retirement as it is still running on ring B.
> > 
> > Fix this by doing proper batch_obj reference counting.
> > 
> > Signed-off-by: Mika Kuoppala <mika.kuoppala at intel.com>
> > 
> > Notes:
> >     This is a patch which ameliorates the
> >     [PATCH] tests/gem_reset_stats: add close-pending-fork
> >     
> >     Chris wasn't happy about the refcounting as it might hide
> >     the true problem. But I haven't been able to find the real culprit,
> >     thus the RFC.
> 
> I think I understand the bug now, and your patch looks like the correct
> fix. But we need to pimp the commit message.
> 
> In i915_gem_reset_ring_lists we reset requests and move objects to the
> inactive list. Which means if the active list is the last one to hold a
> reference, the object will disappear.
> 
> Now the problem is that we do this per-ring, and not in the order that the
> objects would have been retired if the gpu wouldn't have hung. E.g. if a
> batch is active on both ring 1&2 but was last active on ring 1, then we'd
> free it before we go ahead with cleaning up the requests for ring 2.
> 
> So reference-counting the batch_obj pointers looks like the real fix.

No. The bug only exists in i915_gem_reset() and should not impact
anywhere else.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre



More information about the Intel-gfx mailing list