[Intel-gfx] [RFC] drm/i915/bdw+: Do not emit user interrupts when not needed
chris at chris-wilson.co.uk
Fri Dec 18 04:28:36 PST 2015
On Fri, Dec 18, 2015 at 11:59:41AM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> We can rely on context complete interrupt to wake up the waiters
> apart in the case where requests are merged into a single ELSP
> submission. In this case we inject MI_USER_INTERRUPTS in the
> ring buffer to ensure prompt wake-ups.
> This optimization has the effect on for example GLBenchmark
> Egypt off-screen test of decreasing the number of generated
> interrupts per second by a factor of two, and context switched
> by factor of five to six.
I half like it. Are the interupts a limiting factor in this case though?
This should be ~100 waits/second with ~1000 batches/second, right? What
is the delay between request completion and client wakeup - difficult to
measure after you remove the user interrupt though! But I estimate it
should be on the order of just a few GPU cycles.
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 27f06198a51e..d9be878dbde7 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -359,6 +359,13 @@ static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
> +static void execlists_emit_user_interrupt(struct drm_i915_gem_request *req)
> + struct intel_ringbuffer *ringbuf = req->ringbuf;
> + iowrite32(MI_USER_INTERRUPT, ringbuf->virtual_start + req->tail - 8);
> static int execlists_update_context(struct drm_i915_gem_request *rq)
> struct intel_engine_cs *ring = rq->ring;
> @@ -433,6 +440,12 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
> cursor->elsp_submitted = req0->elsp_submitted;
> + /*
> + * When merging requests make sure there is still
> + * something after each batch buffer to wake up waiters.
> + */
> + if (cursor != req0)
> + execlists_emit_user_interrupt(req0);
You may have already missed this instruction as you patch it, and keep
doing so as long as the context is resubmitted. I think to be safe, you
need to patch cursor as well. You could then MI_NOOP out the MI_INTERUPT
on the terminal request.
An interesting igt experiement I think would be:
thread A, keep queuing batches with just a single MI_STORE_DWORD_IMM *addr
thread B, waits on batch from A, reads *addr (asynchronously), measures
latency (actual value - expected(batch))
Run for 10s, report min/max/median latency.
Repeat for more threads/contexts and more waiters. Ah, that may be the
demonstration for the thundering herd I've been looking for!
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx