[Intel-gfx] [RFC] drm/i915: reference count batch object on requests
Chris Wilson
chris at chris-wilson.co.uk
Wed Dec 4 13:11:43 CET 2013
On Wed, Dec 04, 2013 at 01:08:56PM +0100, Daniel Vetter wrote:
> On Wed, Dec 4, 2013 at 12:24 PM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > On Tue, Dec 03, 2013 at 06:10:05PM +0100, Daniel Vetter wrote:
> >> On Mon, Dec 02, 2013 at 05:31:53PM +0200, Mika Kuoppala wrote:
> >> > We used to lean on active_list to handle the references
> >> > to batch objects. But there are useful cases when same,
> >> > albeit simple, batch can be executing on multiple rings
> >> > concurrently. For this case the active_list reference count
> >> > handling is just not enough as batch could be freed by
> >> > ring A request retirement as it is still running on ring B.
> >> >
> >> > Fix this by doing proper batch_obj reference counting.
> >> >
> >> > Signed-off-by: Mika Kuoppala <mika.kuoppala at intel.com>
> >> >
> >> > Notes:
> >> > This is a patch which ameliorates the
> >> > [PATCH] tests/gem_reset_stats: add close-pending-fork
> >> >
> >> > Chris wasn't happy about the refcounting as it might hide
> >> > the true problem. But I haven't been able to find the real culprit,
> >> > thus the RFC.
> >>
> >> I think I understand the bug now, and your patch looks like the correct
> >> fix. But we need to pimp the commit message.
> >>
> >> In i915_gem_reset_ring_lists we reset requests and move objects to the
> >> inactive list. Which means if the active list is the last one to hold a
> >> reference, the object will disappear.
> >>
> >> Now the problem is that we do this per-ring, and not in the order that the
> >> objects would have been retired if the gpu wouldn't have hung. E.g. if a
> >> batch is active on both ring 1&2 but was last active on ring 1, then we'd
> >> free it before we go ahead with cleaning up the requests for ring 2.
> >>
> >> So reference-counting the batch_obj pointers looks like the real fix.
> >
> > No. The bug only exists in i915_gem_reset() and should not impact
> > anywhere else.
>
> Well fixing the bug in i915_gem_reset would be lots more work and
> rather fragile - we'd need to retire requests in the correct order.
> That level of fanciness is generally not something I like to see in
> less-tested code like the reset paths.
Nope, just the operations in the right order.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list