[Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

Thu Sep 26 19:31:09 UTC 2019

Quoting Brian Welty (2019-09-26 19:57:17)
> 
> On 9/26/2019 7:25 AM, Chris Wilson wrote:
> > Moving our primary irq handler to a RT thread incurs an extra 1us delay
> > in process interrupts. This is most notice in waking up client threads,
> > where it adds about 20% of extra latency. It also imposes a delay in
> > feeding the GPU, an extra 1us before signaling secondary engines and
> > extra latency in resubmitting work to keep the GPU busy. The latter case
> > is insignificant as the latency hidden by the active GPU, and
> > preempt-to-busy ensures that no extra latency is incurred for
> > preemption.
> > 
> > The benefit is that we reduced the impact on the rest of the system, the
> > cycletest shows a reduction from 5us mean latency to 2us, with the
> > maximum observed latency (in a 2 minute window) reduced by over 160us.
> 
> Hi Chris,
> 
> I'm curious to understand this a little better.
> If only benefit of this patch is improving overall system performance, then
> can you say why i915 interrupt handling doesn't fit into what I thought was
> the common usage of threaded interrupts...

The way cyclictest determines system impact is via latency in delivery
of a timer to userspace; it observes irq-off time. We use that as a
convenient metric to determine how long we block the system while
servicing the gpu.

> Usually with request_threaded_irq(), I thought typically both handlers
> are specified and so you'd only fallback to the threaded handler when the
> interrupt context handler is overwhelmed?
> Like so:
>         while (HW events need some action) {
>                 .... do something ...
>                 if (too overwhelmed)  /* ie. reduce load on system */
>                         return IRQ_WAKE_THREAD;
>         }
>         return IRQ_HANDLED;
> 
> Likely you considered something like above.
> I'm just looking to understand why using only threaded interrupt handler is best in this case.
> (Also maybe discuss this a little bit further in commit message...?)

Convenience. The primary motivation for the patch was to test
Sebastian's patches in our CI which are targeting PREEMPT_RT issues and
he noted that would also be reproduced in a threaded irq handler.

The measurements I did were to assuage my own gut response that a
threaded irq handler would be devastating to our latency metrics
(although a 1us extra latency is terrible if our goal happens to be
sub-1us gpu->client latency!). (I was contemplating adding IRQF_NO_THREAD
to avoid the issue.)

The problem I feel with the overflow of primary to thread is what value
do you decide is acceptable latency for the rest of the system; and why
bother to kick off a second thread if all it does is service a tasklet,
would we be better served by a dedicated thread per-submission-thread?
(Which might sound familiar -- it what's we used before tasklets.)

It's definitely a tradeoff; our latency versus the rest of the system.
Though there were a few advantages to having an RT thread spawned for
the softirq (which I guess is was because RT threads are granted a
frequency boost), and if we can reduce the number of irq-off spinlocks
that may have much greater advantages across the driver that outweigh
the cost of the extra context switch in client wakeup.

> Sorry if this was perhaps discussed in response to Tvrtko's question.
> I didn't quite follow the fast/slow separation mentioned in that thread.

Our interrupt handlers are effectively split into 2 phases. First we read
the master control and various IIR, and ack the interrupt, after which
the GPU is free to raise an interrupt again. Then we process the
interrupt, handling the various bits asserted in the IIR (signaling
clients, feeding the GPU its next context, completing vblanks etc).
The first ack should be fast -- you can demonstrate the impact of not
doing the ack/handler split, by measuring the throughput on multiple
engines (a context-switch event on the second engine will be noticeably
blocked by the interrupt handler on the first, if you are not careful).
(gen11+ breaks this split atm as we haven't worked out how we are meant
to ack the banked GT IIR yet, or at least no one has seen Icelake to
measure the impact).
-Chris