[Intel-gfx] [PATCH 2/3] drm/i915/execlists: Minimalistic timeslicing

Thu Jun 20 13:57:32 UTC 2019

Quoting Mika Kuoppala (2019-06-20 14:51:24)
> > +static void
> > +defer_request(struct i915_request * const rq, struct list_head * const pl)
> > +{
> > +     struct i915_dependency *p;
> > +
> > +     /*
> > +      * We want to move the interrupted request to the back of
> > +      * the round-robin list (i.e. its priority level), but
> > +      * in doing so, we must then move all requests that were in
> > +      * flight and were waiting for the interrupted request to
> > +      * be run after it again.
> > +      */
> > +     list_move_tail(&rq->sched.link, pl);
> > +
> > +     list_for_each_entry(p, &rq->sched.waiters_list, wait_link) {
> > +             struct i915_request *w =
> > +                     container_of(p->waiter, typeof(*w), sched);
> > +
> > +             /* Leave semaphores spinning on the other engines */
> > +             if (w->engine != rq->engine)
> > +                     continue;
> > +
> > +             /* No waiter should start before the active request completed */
> > +             GEM_BUG_ON(i915_request_started(w));
> > +
> > +             GEM_BUG_ON(rq_prio(w) > rq_prio(rq));
> > +             if (rq_prio(w) < rq_prio(rq))
> > +                     continue;
> > +
> > +             if (list_empty(&w->sched.link))
> > +                     continue; /* Not yet submitted; unready */
> > +
> > +             /*
> > +              * This should be very shallow as it is limited by the
> > +              * number of requests that can fit in a ring (<64) and
> 
> s/and/or ?

I think "and" works better as each context has their own ring, so it's a
multiplicative effect.

> > +              * the number of contexts that can be in flight on this
> > +              * engine.
> > +              */
> > +             defer_request(w, pl);
> 
> So the stack frame will be 64*(3*8 + preample/return) at worst case?
> can be over 2k

Ok, that makes it sound scary -- but we are well within the 8k irq
limit. (Even interrupts now have 2 pages iirc, but even at 4k we are
well within bounds.)

> > @@ -906,6 +982,27 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >                        */
> >                       last->hw_context->lrc_desc |= CTX_DESC_FORCE_RESTORE;
> >                       last = NULL;
> > +             } else if (need_timeslice(engine, last) &&
> > +                        !timer_pending(&engine->execlists.timer)) {
> > +                     GEM_TRACE("%s: expired last=%llx:%lld, prio=%d, hint=%d\n",
> > +                               engine->name,
> > +                               last->fence.context,
> > +                               last->fence.seqno,
> > +                               last->sched.attr.priority,
> > +                               execlists->queue_priority_hint);
> > +
> > +                     ring_pause(engine) = 1;
> > +                     defer_active(engine);
> > +
> > +                     /*
> > +                      * Unlike for preemption, if we rewind and continue
> > +                      * executing the same context as previously active,
> > +                      * the order of execution will remain the same and
> > +                      * the tail will only advance. We do not need to
> > +                      * force a full context restore, as a lite-restore
> > +                      * is sufficient to resample the monotonic TAIL.
> > +                      */
> 
> I would have asked about the force preemption without this fine comment.
> 
> But this is a similar as the other kind of preemption. So what happens
> when the contexts are not the same?

It's just a normal preemption event. The old ring regs are saved and we
don't try and scribble over them. Any future use of the old context will
have the same RING_TAIL as before or later (new request) so we will
never try to program a backwards step.
-Chris