[Intel-gfx] [PATCH v2] drm/i915: Stop listening to request resubmission from the signaler kthread

Chris Wilson chris at chris-wilson.co.uk
Mon Dec 11 15:29:53 UTC 2017


Quoting Tvrtko Ursulin (2017-12-11 15:24:06)
> 
> On 11/12/2017 14:26, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2017-12-11 14:19:47)
> >>
> >> On 08/12/2017 12:10, Chris Wilson wrote:
> >>> The intent here was that we would be listening to
> >>> i915_gem_request_unsubmit in order to cancel the signaler quickly and
> >>> release the reference on the request. Cancelling the signaler is done
> >>> directly via intel_engine_cancel_signaling (called from unsubmit), but
> >>> that does not directly wake up the signaling thread, and neither does
> >>> setting the request->global_seqno back to zero wake up listeners to the
> >>> request->execute waitqueue. So the only time that listening to the
> >>> request->execute waitqueue would wake up the signaling kthread would be
> >>> on the request resubmission, during which time we would already receive
> >>> wake ups from rejoining the global breadcrumbs wait rbtree.
> >>>
> >>> Trying to wake up to release the request remains an issue. If the
> >>> signaling was cancelled and no other request required signaling, then it
> >>> is possible for us to shutdown with the reference on the request still
> >>> held. To ensure that we do not try to shutdown, leaking that request, we
> >>> kick the signaling threads whenever we disarm the breadcrumbs, i.e. on
> >>> parking the engine when idle.
> >>>
> >>> Fixes: d6a2289d9d6b ("drm/i915: Remove the preempted request from the execution queue")
> >>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
> >>> Cc: MichaƂ Winiarski <michal.winiarski at intel.com>
> >>> ---
> >>>    drivers/gpu/drm/i915/intel_breadcrumbs.c | 18 +++++++++---------
> >>>    1 file changed, 9 insertions(+), 9 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> >>> index 5ae2d276f7f3..b826725989b1 100644
> >>> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> >>> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> >>> @@ -216,7 +216,7 @@ void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
> >>>        struct intel_wait *wait, *n, *first;
> >>>    
> >>>        if (!b->irq_armed)
> >>> -             return;
> >>> +             goto wakeup_signaler;
> >>>    
> >>>        /* We only disarm the irq when we are idle (all requests completed),
> >>>         * so if the bottom-half remains asleep, it missed the request
> >>> @@ -239,6 +239,14 @@ void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
> >>>        b->waiters = RB_ROOT;
> >>>    
> >>>        spin_unlock_irq(&b->rb_lock);
> >>> +
> >>> +     /*
> >>> +      * The signaling thread may be asleep holding a reference to a request,
> >>> +      * that had its signaling cancelled prior to being preempted. We need
> >>> +      * to kick the signaler, just in case, to release any such reference.
> >>> +      */
> >>> +wakeup_signaler:
> >>> +     wake_up_process(b->signaler);
> > 
> > One thing I was thinking about from our conversation, is whether we can
> > rely on the kthread_stop() kick for the final ref release? Whether it's
> > even a good idea to let the reference live for that long?
> 
> Probably not, feels awkward. In that sense previous version feels better.

Ok.
 
> > Basically it will mean replacing the GEM_BUG_ON(request) there with
> > another unref. But it will remove the complication and assoicated
> > commentary here.
> 
> Hm, and is there a test which was able to hit the GEM_BUG_ON? It needed 
> to go in the previous version as well, no?

It does indeed need to be there today, and with the v1 patch. Since
there's no serialisation to allow the previous wakeup to complete and
schedule() again before we hit the kthread_should_stop.
-Chris


More information about the Intel-gfx mailing list