[Intel-gfx] [PATCH 06/16] drm/i915/gt: Decouple completed requests on unwind

Chris Wilson chris at chris-wilson.co.uk
Wed Nov 25 10:21:46 UTC 2020


Quoting Tvrtko Ursulin (2020-11-25 09:15:25)
> 
> On 24/11/2020 17:31, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-11-24 17:13:02)
> >>
> >> On 24/11/2020 11:42, Chris Wilson wrote:
> >>> Since the introduction of preempt-to-busy, requests can complete in the
> >>> background, even while they are not on the engine->active.requests list.
> >>> As such, the engine->active.request list itself is not in strict
> >>> retirement order, and we have to scan the entire list while unwinding to
> >>> not miss any. However, if the request is completed we currently leave it
> >>> on the list [until retirement], but we could just as simply remove it
> >>> and stop treating it as active. We would only have to then traverse it
> >>> once while unwinding in quick succession.
> >>>
> >>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>> ---
> >>>    drivers/gpu/drm/i915/gt/intel_lrc.c | 6 ++++--
> >>>    drivers/gpu/drm/i915/i915_request.c | 3 ++-
> >>>    2 files changed, 6 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> index 30aa59fb7271..cf11cbac241b 100644
> >>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> @@ -1116,8 +1116,10 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
> >>>        list_for_each_entry_safe_reverse(rq, rn,
> >>>                                         &engine->active.requests,
> >>>                                         sched.link) {
> >>> -             if (i915_request_completed(rq))
> >>> -                     continue; /* XXX */
> >>> +             if (i915_request_completed(rq)) {
> >>> +                     list_del_init(&rq->sched.link);
> >>> +                     continue;
> >>> +             }
> >>>    
> >>>                __i915_request_unsubmit(rq);
> >>>    
> >>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> >>> index 8d7d29c9e375..a9db1376b996 100644
> >>> --- a/drivers/gpu/drm/i915/i915_request.c
> >>> +++ b/drivers/gpu/drm/i915/i915_request.c
> >>> @@ -321,7 +321,8 @@ bool i915_request_retire(struct i915_request *rq)
> >>>         * after removing the breadcrumb and signaling it, so that we do not
> >>>         * inadvertently attach the breadcrumb to a completed request.
> >>>         */
> >>> -     remove_from_engine(rq);
> >>> +     if (!list_empty(&rq->sched.link))
> >>> +             remove_from_engine(rq);
> >>
> >> The list_empty check is unlocked so is list_del_init in
> >> remove_from_engine safe on potentially already unlinked request or it
> >> needs to re-check under the lock?
> > 
> > It's safe. The unwind is under the lock, and remove_from_engine takes
> > the lock, and both do list_del_init() which is a no-op if already
> > removed. And the end state of the flag bits is the same on each path. We
> > can skip the __notify_execute_cb_imm() since we know in unwind it is
> > executing and there should be no cb.
> > 
> > The test before we take the lock is only allowed to skip the active.lock
> > if it sees the list is already decoupled, in which case we can leave it
> > to the unwind to remove it from the engine (and we know that the request
> > can only have been inflight prior to completion). Since the test is not
> > locked, we don't serialise with the removal, but the list_del_init is
> > the last action on the request so there is no window where the unwind is
> > accessing the request after it may have been retired.
> > 
> > list_move() will not confuse list_empty(), as although it does a
> > list_del_entry, it is not temporarily re-initialised to an empty list.
> 
> List_del_init is indeed safe. List_move.. which one you think can race 
> with retire? Preempt-to-busy unwinding an almost completed request yet 
> again? Or even preempt timeout racing with completion?

Here in unwind. We pass the completion check, but the request may still
be running and complete at any time (until we submit & ack the new ELSP).
So an unlocked list_empty check during retire can race with any of the
list_move during unwind and resubmit. (On resubmit, we check completed
under the lock and drop the request in __i915_request_submit which
should also leave it in a consistent state as if we had called
remove_from_engine.)
-Chris


More information about the Intel-gfx mailing list