[Intel-gfx] [PATCH] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission
Chris Wilson
chris at chris-wilson.co.uk
Mon Mar 23 09:45:38 UTC 2020
Quoting Francisco Jerez (2020-03-20 22:14:51)
> Francisco Jerez <currojerez at riseup.net> writes:
>
> > Chris Wilson <chris at chris-wilson.co.uk> writes:
> >
> >> We dropped calling process_csb prior to handling direct submission in
> >> order to avoid the nesting of spinlocks and lift process_csb() and the
> >> majority of the tasklet out of irq-off. However, we do want to avoid
> >> ksoftirqd latency in the fast path, so try and pull the interrupt-bh
> >> local to direct submission if we can acquire the tasklet's lock.
> >>
> >> v2: Tweak the balance to avoid over submitting lite-restores
> >>
> >> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >> Cc: Francisco Jerez <currojerez at riseup.net>
> >> Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
> >> ---
> >> drivers/gpu/drm/i915/gt/intel_lrc.c | 44 ++++++++++++++++++++------
> >> drivers/gpu/drm/i915/gt/selftest_lrc.c | 2 +-
> >> 2 files changed, 36 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> index f09dd87324b9..dceb65a0088f 100644
> >> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> @@ -2884,17 +2884,17 @@ static void queue_request(struct intel_engine_cs *engine,
> >> set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> >> }
> >>
> >> -static void __submit_queue_imm(struct intel_engine_cs *engine)
> >> +static bool pending_csb(const struct intel_engine_execlists *el)
> >> {
> >> - struct intel_engine_execlists * const execlists = &engine->execlists;
> >> + return READ_ONCE(*el->csb_write) != READ_ONCE(el->csb_head);
> >> +}
> >>
> >> - if (reset_in_progress(execlists))
> >> - return; /* defer until we restart the engine following reset */
> >> +static bool skip_lite_restore(struct intel_engine_execlists *el,
> >> + const struct i915_request *rq)
> >> +{
> >> + struct i915_request *inflight = execlists_active(el);
> >>
> >> - if (execlists->tasklet.func == execlists_submission_tasklet)
> >> - __execlists_submission_tasklet(engine);
> >> - else
> >> - tasklet_hi_schedule(&execlists->tasklet);
> >> + return inflight && inflight->context == rq->context;
> >> }
> >>
> >> static void submit_queue(struct intel_engine_cs *engine,
> >> @@ -2905,8 +2905,34 @@ static void submit_queue(struct intel_engine_cs *engine,
> >> if (rq_prio(rq) <= execlists->queue_priority_hint)
> >> return;
> >>
> >> + if (reset_in_progress(execlists))
> >> + return; /* defer until we restart the engine following reset */
> >> +
> >> + /*
> >> + * Suppress immediate lite-restores, leave that to the tasklet.
> >> + *
> >> + * However, we leave the queue_priority_hint unset so that if we do
> >> + * submit a second context, we push that into ELSP[1] immediately.
> >> + */
> >> + if (skip_lite_restore(execlists, rq))
> >> + return;
> >> +
> > Why do you need to treat lite-restore specially here?
Lite-restore have a noticeable impact on no-op loads. A part of that is
that a lite-restore is about 1us, and the other part is that the driver
has a lot more work to do. There's a balance point around here for not
needlessly interrupting ourselves and ensuring that there is no bubble.
> >
> > Anyway, trying this out now in combination with my patches now.
> >
>
> This didn't seem to help (together with your other suggestion to move
> the overload accounting to __execlists_schedule_in/out). And it makes
> the current -5% SynMark OglMultithread regression with my series go down
> to -10%. My previous suggestion of moving the
> intel_gt_pm_active_begin() call to process_csb() when the submission is
> ACK'ed by the hardware does seem to help (and it roughly halves the
> OglMultithread regression), possibly because that way we're able to
> determine whether the first context was actually overlapping at the
> point that the second was received by the hardware -- I haven't tested
> it extensively yet though.
Grumble, it just seems like we are setting and clearing the flag on
completely unrelated events -- which I still think boils down to working
around latency in the driver. Or at least I hope there's an explanation
and bug to fix that improves responsiveness for all.
-Chris
More information about the Intel-gfx
mailing list