[Intel-gfx] [PATCH 08/11] drm/i915/execlists: Force preemption via reset on timeout

Chris Wilson chris at chris-wilson.co.uk
Tue Mar 27 09:23:26 UTC 2018


Quoting Tvrtko Ursulin (2018-03-27 09:51:20)
> 
> On 26/03/2018 12:50, Chris Wilson wrote:
> > +static enum hrtimer_restart preempt_timeout(struct hrtimer *hrtimer)
> > +{
> > +     struct intel_engine_execlists *execlists =
> > +             container_of(hrtimer, typeof(*execlists), preempt_timer);
> > +
> > +     GEM_TRACE("%s\n",
> > +               container_of(execlists,
> > +                            struct intel_engine_cs,
> > +                            execlists)->name);
> > +
> > +     queue_work(system_highpri_wq, &execlists->preempt_reset);
> 
> I suppose indirection from hrtimer to worker is for better than jiffie 
> timeout granularity? But then queue_work might introduce some delay to 
> defeat that.

Yes. It's betting on highpri_wq being just that. We can do better with
our own RT kthread and a wakeup from the hrtimer if required.

Hmm, the original plan (watchdog TDR) was to avoid using the global
reset entirely. The per-engine reset (although needs serialising with
itself) doesn't need struct_mutex, so we should be able to do that from
the timer directly, and just kick off the global reset on failure.

That sounds a whole lot better, let's dust off that code and see what
breaks.

> I am wondering if simply schedule_delayed_work directly wouldn't be good 
> enough. I suppose it is a question for the product group. But it is also 
> implementation detail.
> 
> > +     return HRTIMER_NORESTART;
> > +}
> > +
> > +static void preempt_reset(struct work_struct *work)
> > +{
> > +     struct intel_engine_cs *engine =
> > +             container_of(work, typeof(*engine), execlists.preempt_reset);
> > +
> > +     GEM_TRACE("%s\n", engine->name);
> > +
> > +     tasklet_disable(&engine->execlists.tasklet);
> > +
> > +     engine->execlists.tasklet.func(engine->execlists.tasklet.data);
> 
> Comment on why calling the tasklet directly.

/* ksoftirqd hates me; I hate ksoftirqd. */
 
> > +
> > +     if (execlists_is_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT))
> > +             i915_handle_error(engine->i915, BIT(engine->id), 0,
> > +                               "preemption timed out on %s", engine->name);
> 
> Can this race with the normal reset and we end up wit i915_handle_error 
> twice simultaneously?

i915_handle_error() serialises itself against multiple calls to the same
set of engines and against a global reset.

There's a window for us to try and reset the same request twice (in
quick succession) though. That's not good.

Also I think we need some more hand holding here to provoke the right
reset. Hmm. We may need to break i915_handle_error() down quite a bit.

(Ok decided to try and not use i915_handle_error() here, at least not on
the immediate path.)

Hmm, going back to that discussion, we will have to dig through the
details on when exactly hrtimer comes from softirq, or else we may
deadlock with waiting on the tasklet. Hmm indeed.

> > +     tasklet_enable(&engine->execlists.tasklet);
> >   }
> >   
> >   static void complete_preempt_context(struct intel_engine_execlists *execlists)
> > @@ -542,6 +583,10 @@ static void complete_preempt_context(struct intel_engine_execlists *execlists)
> >       execlists_cancel_port_requests(execlists);
> >       execlists_unwind_incomplete_requests(execlists);
> >   
> > +     /* If the timer already fired, complete the reset */
> > +     if (hrtimer_try_to_cancel(&execlists->preempt_timer) < 0)
> > +             return;
> 
> What about timer which had already fired and queued the worker? 
> hrtimer_try_to_cancel will return zero for that case I think.

That's -1. 0 is the timer already fired and executed. 1 is pending.

If the timer fired, executed its callback and then we execute, we
clear the pending flag and proceed as normal (also when we get called
from inside the worker after the hrtimer). The worker sees the cleared
flag in that case and skips the reset.
-Chris


More information about the Intel-gfx mailing list