[Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

Thu Sep 26 15:08:56 UTC 2019

Quoting Tvrtko Ursulin (2019-09-26 15:57:07)
> 
> On 26/09/2019 15:25, Chris Wilson wrote:
> > Moving our primary irq handler to a RT thread incurs an extra 1us delay
> > in process interrupts. This is most notice in waking up client threads,
> > where it adds about 20% of extra latency. It also imposes a delay in
> > feeding the GPU, an extra 1us before signaling secondary engines and
> > extra latency in resubmitting work to keep the GPU busy. The latter case
> > is insignificant as the latency hidden by the active GPU, and
> > preempt-to-busy ensures that no extra latency is incurred for
> > preemption.
> > 
> > The benefit is that we reduced the impact on the rest of the system, the
> > cycletest shows a reduction from 5us mean latency to 2us, with the
> > maximum observed latency (in a 2 minute window) reduced by over 160us.
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Cc: Clark Williams <williams at redhat.com>
> > Cc: Sebastian Andrzej Siewior <bigeasy at linutronix.de>
> > ---
> > Note this should need the fixes in
> > 20190926105644.16703-2-bigeasy at linutronix.de, by itself this should be a
> > test vehicle to exercise that patch!
> > ---
> >   drivers/gpu/drm/i915/i915_irq.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > index bc83f094065a..f3df7714a3f3 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -4491,8 +4491,8 @@ int intel_irq_install(struct drm_i915_private *dev_priv)
> >   
> >       intel_irq_reset(dev_priv);
> >   
> > -     ret = request_irq(irq, intel_irq_handler(dev_priv),
> > -                       IRQF_SHARED, DRIVER_NAME, dev_priv);
> > +     ret = request_threaded_irq(irq, NULL, intel_irq_handler(dev_priv),
> > +                                IRQF_SHARED, DRIVER_NAME, dev_priv);
> >       if (ret < 0) {
> >               dev_priv->drm.irq_enabled = false;
> >               return ret;
> > 
> 
> Two questions:
> 
> 1. Should we split out the master_ctl handling into the fast handler? 
> Although can we pass anything from the fast to threaded handler? If not 
> we'd have to re-read the master_ctl from the threaded handler.

I did look at using the primary/threaded irq handler to do some
fast/slow handling (but RT is probably going to force the primary into a
thread, so long term prospect there looked bleak ;)

In our pre-gen11 model we could certainly do the ack in the fast
primary handler, pushing the processing into the secondary thread. Post
gen11 is more annoying and we probably only do the master_ctl
immediately. (It's very tempting to put GT handler into fast, but that
defeats the PREEMPT_RT objections ;)

We can accumulate into i915 anything we want to pass from primary to
secondary. Or we can use per-cpu handlers and data structures.

> 2. What about our tasklets - with threaded irqs we don't need them any 
> more, right? So in this case they just add additional latency.

Our tasklets run immediately within the RT thread. That appears to be a
positive outcome of this. We keep our direct submission so our initial
latency is not impacted at all, and the impact of the thread latency is
minimised by keeping the GPU busy. So the only (easily?) measurable
impact is client wakeup. (I have a modicum of concern that there is a
priority inversion here for an RT client waiting on the GPU.)
-Chris