[Intel-gfx] [PATCH 9/9] drm/i915/gt: Schedule request retirement when timeline idles

Wed Nov 20 14:25:28 UTC 2019

Quoting Tvrtko Ursulin (2019-11-20 14:16:17)
> 
> On 20/11/2019 13:39, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-11-20 13:16:51)
> >>
> >> On 20/11/2019 09:33, Chris Wilson wrote:
> >>> +
> >>> +             /*
> >>> +              * Our goal here is to retire _idle_ timelines as soon as
> >>> +              * possible (as they are idle, we do not expect userspace
> >>> +              * to be cleaning up anytime soon).
> >>> +              *
> >>> +              * If the tl->active_count is already zero, someone else
> >>> +              * should have retired the timeline. Equally if the timeline
> >>> +              * is currently locked, either it is being retired elsewhere
> >>> +              * or about to be!
> >>> +              */
> >>> +             if (atomic_read(&tl->active_count) &&
> >>> +                 mutex_trylock(&tl->mutex)) {
> >>> +                     retire_requests(tl);
> >>> +                     mutex_unlock(&tl->mutex);
> >>> +             }
> >>> +             intel_timeline_put(tl);
> >>> +
> >>> +             GEM_BUG_ON(!next);
> >>> +             tl = ptr_mask_bits(next, 1);
> >>
> >> You sometimes expect engine->retire to contain 0x1?
> > 
> > Yes, imagine that we are submitting very fast such that we schedule_out
> > the same context before the worker ran, we would then try to
> > add_retire() the same timeline again. So I was using BIT(0) to tag an
> > active element in the retirement list.
> 
> If we have schedule_out on the same context before the retire ran then 
> add_retire is a no-op due tl->retire being set.

The first element in the list has timeline->retire = NULL. We only know
it is set because we make it timeline->retire = 1 instead.

> It could be a race between engine_retire and add_retire where 
> engine->retire is set to NULL so latter would set tl->retire to NULL | 
> BIT(0).
> 
> But BIT(0) is only filtered out and not acted upon anywhere so I am 
> still lost.
> 
> So maybe from the opposite angle, what goes wrong if you drop all BIT(0) 
> business?

You insert the element into the list twice causing a loop. Now that loop
is short lived as we clear timeline->retire on processing, but we may
still end up dropping a reference twice.

> 
> > 
> >>> +     } while (tl);
> >>> +}
> >>> +
> >>> +static bool add_retire(struct intel_engine_cs *engine,
> >>> +                    struct intel_timeline *tl)
> >>> +{
> >>> +     struct intel_timeline *first = READ_ONCE(engine->retire);
> >>> +
> >>> +     /*
> >>> +      * We open-code a llist here to include the additional tag [BIT(0)]
> >>> +      * so that we know when the timeline is already on a
> >>> +      * retirement queue: either this engine or another.
> >>> +      *
> >>> +      * However, we rely on that a timeline can only be active on a single
> >>> +      * engine at any one time and that add_retire() is called before the
> >>> +      * engine releases the timeline and transferred to another to retire.
> >>> +      */
> >>> +
> >>> +     if (READ_ONCE(tl->retire)) /* already queued */
> >>> +             return false;
> >>
> >> Can't this go first in the function?
> > 
> > Conceptually it is. And I made it so because I also decided against
> > having the READ_ONCE() at the top.
> 
> But READ_ONCE is at the top, just a different one which is discarded if 
> this one is NULL.

static bool add_retire(struct intel_engine_cs *engine,
                       struct intel_timeline *tl)
{
        struct intel_timeline *first;

        /*
         * We open-code a llist here to include the additional tag [BIT(0)]
         * so that we know when the timeline is already on a
         * retirement queue: either this engine or another.
         *
         * However, we rely on that a timeline can only be active on a single
         * engine at any one time and that add_retire() is called before the
         * engine releases the timeline and transferred to another to retire.
         */

        if (READ_ONCE(tl->retire)) /* already queued */
                return false;

        if (!atomic_read(&tl->active_count)) /* already retired */
                return false;

        intel_timeline_get(tl);
        first = READ_ONCE(engine->retire);
        do
                tl->retire = ptr_pack_bits(first, 1, 1);
        while (!try_cmpxchg(&engine->retire, &first, tl));

        return !first;
}

is the current incarnation.

> >>> +
> >>> +     intel_timeline_get(tl);
> >>> +     do
> >>> +             tl->retire = ptr_pack_bits(first, 1, 1);
> >>
> >> Here you rely on assignment being atomic right?
> > 
> > Ish. Here we rely on the timeline being owned by the engine so it cannot
> > be submitted by another (and so schedule_out called) until this engine
> > has released it.
> > 
> > It is a weak point for generality, but the ordering is strong in
> > execlists.
> > 
> >>> +     while (!try_cmpxchg(&engine->retire, &first, tl));
> >>
> >> So the loop is effectively creating a chain of timelines to retire on
> >> this engine.
> >>
> >> What happens with virtual engines when a timeline goes to different
> >> engine before (well or any single timeline context) the retire worker
> >> runs? Ah okay, it gets re-assigned to the most recent engine.
> > 
> > Right. The engine_retire() doesn't care which engine the timelines were
> > run on, it's just a list of suspected idle timelines.
> > 
> >> I am not sure about the BIT(0) business. It's always set on write so I
> >> am not getting why it is useful.
> > 
> > It's also set to 0 on consumption :)
> 
> Another thing - assert that engine->retire is NULL after flush_work in 
> intel_engine_fini_retire could be useful.

Yup, wilco.
-Chris